A new study by Google suggests that advanced reasoning models achieve high performance by simulating multi-agent-like debates involving diverse perspectives, personality traits, and domain expertise.Their experiments demonstrate that this internal debate, which they dub “society of thought,” significantly improves model performance in complex reasoning and planning tasks. The researchers found that leading reasoning models such as DeepSeek-R1 and QwQ-32B, which are trained via reinforcement learning (RL), inherently develop this ability to engage in society of thought conversations without explicit instruction.These findings offer a roadmap for how developers can build more robust LLM applications and how enterprises can train superior models using their own internal data.What is society of thought?The core premise of society of thought is that reasoning models learn to emulate social, multi-agent dialogues to refine their logic. This hypothesis draws on cognitive science, specifically the idea that human reason evolved primarily as a social process to solve problems through argumentation and engagement with differing viewpoints.The researchers write that “cognitive diversity, stemming from variation in expertise and personality traits, enhances problem solving, particularly when accompanied by authentic dissent.” Consequently, they suggest that integrating diverse perspectives allows LLMs to develop robust reasoning strategies. By simulating conversations between different internal personas, models can perform essential checks (such as verification and backtracking) that help avoid common pitfalls like unwanted biases and sycophancy.In models like DeepSeek-R1, this “society” manifests directly within the chain of thought. The researchers note that you do not need separate models or prompts to force this interaction; the debate emerges autonomously within the reasoning process of a single model instance.Examples of society of thoughtThe study provides tangible examples of how this interna …