IBM today announced the release of Granite 4.0, the newest generation of its homemade family of open source large language models (LLMs) designed to balance high performance with lower memory and cost requirements. Despite being one of the oldest active tech companies in the U.S. (founded in 1911, 114 years ago!), “Big Blue” as its often nicknamed has already wowed many AI industry workers and followers with this new Granite 4.0 family of LLMs, as they offer high performance on third-party benchmarks; a permissive, business friendly license (Apache 2.0) that allows developers and enterprises to freely take, modify and deploy the models for their own commercial purposes; and, perhaps most importantly, have symbolically put the U.S. back into a competitive place with the growing raft of high-performing new generation open source Chinese LLMs, especially from Alibaba’s prolific Qwen team — alongside OpenAI with its gpt-oss model family released earlier this summer.Meta, the parent company of Facebook and Instagram, was once seen as the world and U.S. leader of open source LLMs with its Llama models, but after the disappointing release of the Llama 4 family in April and lack of its planned, most powerful Llama 4 Behemoth, it has since pursued a different strategy and is now partnering with outside labs like Midjourney on AI products, while it continues to build out an expensive, in-house AI “Superintelligence” team, as well. Little wonder AI engineer Alexander Doria (aka Pierre-Carl Langlais) observed, with a hilarious Lethal Weapon meme, that “ibm suiting up again after llama 4 fumbled,” and “we finally have western qwen.”Hybrid (Transformer/Mamba) theoryAt the heart of IBM’s Granite 4.0 release is a new hybrid design that combines two very different architectures, or underlying organizational structures, for the LLMs in question: transformers and Mamba.Transformers, introduced in 2017 by Vaswani and colleagues in the famous Google paper “Attention Is All You Need,” power most large language models in use today.In this design, every token — essentially a small chunk of text, like a word or part of a word — can compare itself to every other token in the input. This “all-to-all” comparison is what gives transformers their strong ability to capture context and meaning across a passage. The trade-off is efficiency: because the model must calculate relationships between every possible pair of tok …