How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%

by | May 15, 2026 | Technology

One of the key challenges of current multi-agent AI systems is that they communicate by generating and sharing text sequences, which introduces latency, drives up token costs, and makes it difficult to train the entire system as a cohesive unit. To overcome this challenge, researchers at University of Illinois Urbana-Champaign and Stanford University developed RecursiveMAS, a framework that enables agents to collaborate and transmit information through embedding space instead of text. This change results in both efficiency and performance gains. Experiments show that RecursiveMAS achieves accuracy improvement across complex domains like code generation, medical reasoning, and search, while also increasing inference speed and slashing token usage. RecursiveMAS is significantly cheaper to train than standard full fine-tuning or LoRA methods, making it a scalable and cost-effective blueprint for custom multi-agent systems.The challenges of improving multi-agent systemsMulti-agent systems can help tackle complex tasks that single-agent systems struggle to handle. When scaling multi-agent systems for real-world applications, a big challenge is enabling the system to evolve, improve, and adapt to different scenarios over time. Prompt-based adaptation improves agent interactions by iteratively refining the shared context provided to the agents. By updating the prompts, the system acts as a director, guiding the agents to generate responses that are more aligned with the overarching goal. The fundamental limitation is that the capabilities of the models underlying each agent remain static. A more sophisticated approach is to train the agents by updating the weights of the underlying models. Training an entire system of agents is difficult because updating …

Article Attribution | Read More at Article Source