Meta’s SPICE framework lets AI systems teach themselves to reason

by | Nov 11, 2025 | Technology

Researchers at Meta FAIR and the National University of Singapore have developed a new reinforcement learning framework for self-improving AI systems. Called Self-Play In Corpus Environments (SPICE), the framework pits two AI agents against each other, creating its own challenges and gradually improving without human supervision.While currently a proof-of-concept, this self-play mechanism could provide a basis for future AI systems that can dynamically adapt to their environments, making them more robust against the unpredictability of real-world applications.The challenge of self-improving AIThe goal of self-improving AI is to create systems that can enhance their capabilities by interacting with their environment. A common approach is reinforcement learning with verifiable rewards (RLVR), where models are rewarded for providing the correct answers to problems. This is often limited by its reliance on human-curated problem sets and domain-specific reward engineering, which makes it difficult to scale.Self-play, where a model improves by competing against itself, is another promising paradigm. But existing self-play methods for language models are often limited by two critical factors. Factual errors in generated questions and answers compound, leading to a feedback loop of hallucinations. When the problem generator and solver have information symmetry (i.e., share the same knowledge base) they fail to generate genuinely new challenges and fall into repetitive patterns. As the researchers note in their paper, “These systematic empirical failures indicate that self-improvement requires interaction with an external source providing diverse, verifiable feedback, rather than closed-loop pure introspection.”How SPICE worksSPICE is a self-play framework where a single model acts in two distinct roles. A “Challenger” constructs a curriculum of challenging problems from a large corpus of documents. A “Reasoner” then attempts to solve these problems without access to the source documents. This setup breaks the information symmetry that limits other self-play methods, as the Reasoner does not have access to the documents and knowledge that the …

Article Attribution | Read More at Article Source