Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Microsoft is doubling down on the potential of small language models (SLMs) with the unveiling of rStar-Math, a new reasoning technique that can be applied to small models to boost their performance on math problems using reasoning techniques — performance similar to, and in some cases exceeding, that of OpenAI’s o1-preview model.
While still in a research phase — as outlined in a paper published on pre-review site arXiv.org and credited to eight authors at Microsoft, Peking University and Tsinghua University in China — the technique was applied to several different smaller open-source models including Microsoft’s own Phi-3 mini, Alibaba’s Qwen-1.5B (a 1.5-billion-parameter model), and Qwen-7B (a 7-billion-parameter model). It showed improved performance on all of them, even exceeding OpenAI’s previously most advanced model at the MATH (word problem solving) third-party benchmark of 12,500 questions covering various branches such as geometry and algebra, and all levels of difficulty.
Ultimately, according to a post on Hugging Face, the researchers plan to make their code and data available on Github at https://github.com/microsoft/rStar, though one of the paper’s authors, Li Lyna Zhang, wrote in the comments on the Hugging Face post that the team is “still undergoing the internal review process for open-source release.” As such, “the repository remains private for now. Please stay tuned!”
Community members expressed enthusiasm, calling the innovations “impressive” and praising the blend of Monte Carlo Tree Search (MCTS) with step-by-step reasoning. One commenter highlighted the simplicity and utility of using Q-values for step scoring, while others speculated on future applications in geometric proofs and symbolic reasoning.
This news follows closely on the heels of the open-sourcing of Microsoft’s Phi-4 model, a smaller 14-billion-parameter AI system now available on Hugging Face under the permissive MIT license.
While the Phi-4 release has expanded access to high-performance small models, rStar-Math showcases a specialized approach: using smaller AI systems to achieve state-of-the-art results in mathematical reasoning.
rStar-Math works by using several different models and components to help a target small model ‘self-evolve’
The key to rStar-Math is that it leverages Monte Carlo Tree Search (MCTS), a method that mimics human “deep thinking” by iteratively refining step-by-step solutions to mathematical problems.
The researchers used MCTS because it “breaks down complex math problems into simpler single-step generation tasks, reducing the difficulty” for smaller models.
However, they didn’t just apply MCTS as other researchers …