When the transformer architecture was introduced in 2017 in the now seminal Google paper “Attention Is All You Need,” it became an instant cornerstone of modern artificial intelligence. Every major large language model (LLM) — from OpenAI’s GPT series to Anthropic’s Claude, Google’s Gemini, and Meta’s Llama — has been built on some variation of its central mechanism: attention, the mathematical operation that allows a model to look back across its entire input and decide what information matters most.Eight years later, the same mechanism that defined AI’s golden age is now showing its limits. Attention is powerful, but it is also expensive — its computational and memory costs scale quadratically with context length, creating an increasingly unsustainable bottleneck for both research and industry. As models aim to reason across documents, codebases, or video streams lasting hours or days, attention becomes the architecture’s Achilles’ heel.On October 28, 2025, the little-known AI startup Manifest AI introduced a radical alternative. Their new model, Brumby-14B-Base, is a retrained variant of Qwen3-14B-Base, one of the leading open-source transformer models.But while many variants of Qwen have been trained already, Brumby-14B-Base is novel in that it abandons attention altogether. Instead, Brumby replaces those layers with a novel mechanism called Power Retention—a recurrent, hardware-efficient architecture that stores and updates information over arbitrarily long contexts without the exponential memory growth of attention.Trained at a stated cost of just $4,000, the 14-billion-parameter Brumby model performs on par with established transformer models like Qwen3-14B and GLM-4.5-Air, achieving near-state-of-the-art accuracy on a range of reasoning and compreh …