New ‘Markovian Thinking’ technique unlocks a path to million-token AI reasoning

by | Oct 21, 2025 | Technology

Researchers at Mila have proposed a new technique that makes large language models (LLMs) vastly more efficient when performing complex reasoning. Called Markovian Thinking, the approach allows LLMs to engage in lengthy reasoning without incurring the prohibitive computational costs that currently limit such tasks.The team’s implementation, an environment named Delethink, structures the reasoning chain into fixed-size chunks, breaking the scaling problem that plagues very long LLM responses. Initial estimates show that for a 1.5B parameter model, this method can cut the costs of training by more than two-thirds compared to standard approaches.The quadratic curse of long-chain reasoningFor an LLM to solve a complex problem, it often needs to generate a long series of intermediate “thinking” tokens, often referred to as chain-of-thought (CoT). In recent years, researchers have found that using reinforcement learning (RL) to train models to produce longer CoTs (sometimes referred to as LongCoT) has significantly improved their reasoning capabilities.However, the standard method for this has a critical flaw: The AI’s “state” (the prompt plus all the reasoning tokens it has generated thus far in its processing) grows with every new reasoning token. For modern transformer-based models, this means the computational cost explodes quadratically as the reasoning chain gets longer, making it prohibitively expensive to train models for very complex tasks.Most current attempts to manage this cost focus on limiting how much thinking the model does, implicitly preferring shorter solutions or terminating the process early. While these methods offer some relief, the Mila researchers still operate within the LongCoT framework and are thus fundamentally bound by its quadratic nature.Instead of trying to control the computational growth, Mila created an RL environment that avoids the quadratic problem altogether. As co-author Amirhossein Kazemnejad explained, the goal is to enable capabilities like multi-week reas …

Article Attribution | Read More at Article Source