Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

by | May 20, 2026 | Technology

Canadian AI lab Cohere made waves recently by announcing a merger with German AI startup Aleph Alpha, but now it has even more in store for enterprise builders around the globe: today, the firm co-founded by former Googler and “Attention Is All You Need” co-author Aidan Gomez unveiled Command A+, a highly optimized, 218-billion-parameter language model engineered specifically for complex reasoning, multimodal document processing, and agentic workflows.The most significant aspect of the release is not just the model’s capabilities; it is its accessibility. By releasing the model weights free on the popular AI code sharing repository Hugging Face under a highly permissive Apache 2.0 open-source license — a first for the company, according to a post by Gomez, now Cohere’s CEO, on X — Cohere is making a calculated bet on “sovereign AI”—the thesis that enterprises, governments, and developers should have the ability to run, control, and adapt frontier-grade AI entirely within their own secure environments, without sacrificing performance.Sparse architecture with extreme quantizationAt the architectural level, Command A+ represents a major evolution from Cohere’s previous dense models. It is a decoder-only Sparse Mixture-of-Experts (MoE) Transformer. While the model houses a relatively modest 218 billion total parameters, even fewer — only 25 billion — are active during any given generation step. It’s a much lighter footprint and requires far less compute resources to run in inference (serving the model in production environments to end users or via agents) than the proprietary U.S. giants like OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7, which are estimated by third-party observers to be in the trillions of parameters. This sparse architecture is the key to the model’s efficiency. In plain terms, an MoE model routes incoming queries only to the specific “expert” neural networks best suited to handle them, leaving the rest of the model dormant.This is a familiar formulation and one followed by most leading LLMs these days, allowing models to retain the vast knowledge base and nuanced reasoning capabilities of a giant, …

Article Attribution | Read More at Article Source