Sakana AI’s TreeQuest: Deploy multi-model teams that outperform individual LLMs by 30%

by | Jul 3, 2025 | Technology

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Japanese AI lab Sakana AI has introduced a new technique that allows multiple large language models (LLMs) to cooperate on a single task, effectively creating a “dream team” of AI agents. The method, called Multi-LLM AB-MCTS, enables models to perform trial-and-error and combine their unique strengths to solve problems that are too complex for any individual model.

For enterprises, this approach provides a means to develop more robust and capable AI systems. Instead of being locked into a single provider or model, businesses could dynamically leverage the best aspects of different frontier models, assigning the right AI for the right part of a task to achieve superior results.

The power of collective intelligence

Frontier AI models are evolving rapidly. However, each model has its own distinct strengths and weaknesses derived from its unique training data and architecture. One might excel at coding, while another excels at creative writing. Sakana AI’s researchers argue that these differences are not a bug, but a feature.

“We see these biases and varied aptitudes not as limitations, but as precious resources for creating collective intelligence,” the researchers state in their blog post. They believe that just as humanity’s greatest achievements come from diverse teams, AI systems can also achieve more by working together. “By pooling their intelligence, AI systems can solve problems that are insurmountable for any single model.”

Thinking longer at inference time

Sakana AI’s new algorithm is an “inference-time scaling” technique (also referred to as “test-time scaling”), an area of research that has become very popular in the past year. While most of the focus in AI has been on “training-time scaling” (making models bigger and training them on larger datasets), inference-time scaling improves performance by allocating more computational resources after a model is already trained. 

One common approach involves using reinforcement learning to prompt models to generate lon …

Article Attribution | Read More at Article Source