Zoom says it aced AI’s hardest exam. Critics say it copied off its neighbors.

by | Dec 16, 2025 | Technology

Zoom Video Communications, the company best known for keeping remote workers connected during the pandemic, announced last week that it had achieved the highest score ever recorded on one of artificial intelligence’s most demanding tests — a claim that sent ripples of surprise, skepticism, and genuine curiosity through the technology industry.The San Jose-based company said its AI system scored 48.1 percent on the Humanity’s Last Exam, a benchmark designed by subject-matter experts worldwide to stump even the most advanced AI models. That result edges out Google’s Gemini 3 Pro, which held the previous record at 45.8 percent.”Zoom has achieved a new state-of-the-art result on the challenging Humanity’s Last Exam full-set benchmark, scoring 48.1%, which represents a substantial 2.3% improvement over the previous SOTA result,” wrote Xuedong Huang, Zoom’s chief technology officer, in a blog post.The announcement raises a provocative question that has consumed AI watchers for days: How did a video conferencing company — one with no public history of training large language models — suddenly vault past Google, OpenAI, and Anthropic on a benchmark built to measure the frontiers of machine intelligence?The answer reveals as much about where AI is headed as it does about Zoom’s own technical ambitions. And depending on whom you ask, it’s either an ingenious demonstration of practical engineering or a hollow claim that appropriates credit for others’ work.How Zoom built an AI traffic controller instead of training its own modelZoom did not train its own large language model. Instead, the company developed what it calls a “federated AI approach” — a system that routes queries to multiple existing models from OpenAI, Google, and Anthropic, then uses proprietary software to select, combine, and refine their outputs.At the heart of this system sits what Zoom calls its “Z-scorer,” a mechanism that evaluates responses from different models and chooses the best one for any given task. The company pairs this with what it describes as an “explore-verify-federate strategy,” an agentic workflow that ba …

Article Attribution | Read More at Article Source