TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference

by | Feb 5, 2026 | Technology

Researchers from Stanford, Nvidia, and Together AI have developed a new technique that can discover new solutions to very complex problems. For example, they managed to optimize a critical GPU kernel to run 2x faster than the previous state-of-the-art written by human experts.Their technique, called “Test-Time Training to Discover” (TTT-Discover), challenges the current paradigm of letting models “think longer” for reasoning problems. TTT-Discover allows the model to continue training during the inference process and update its weights for the problem at hand.The limits of ‘frozen’ reasoningCurrent enterprise AI strategies often rely on “frozen” models. Whether you use a closed or open reasoning model, the model’s parameters are static. When you prompt these models, they search for answers within the fixed manifold of their training data. This works well for problems that resemble what the model has seen before.However, true discovery problems, like inventing a novel algorithm or proving a new mathematical theorem, are, by definition, out-of-distribution. If the solution requires a leap of logic that doesn’t exist in the training set, a frozen model will likely fail, no matter how much compute you throw at it during inference.In comments to VentureBeat, Mert Yuksekgonul, a co-author of the paper and doctorate student at Stanford, illustrated this distinction using a famous mathematical breakthrough:”I believe that thinking models wouldn’t be able to prove, for example, P != NP, without test-time training, just like Andrew Wiles wouldn’t be able to prove Fermat’s Last Theorem without the 7 years he spent pursuing this single problem in isolation and continuously learning from his own failures.”TTT-Discover treats the test problem …

Article Attribution | Read More at Article Source