Meet ZAYA1-8B, a super efficient, open reasoning model trained on AMD Instinct MI300 GPUs

by News Feed Editor | May 7, 2026 | Technology

Even as leading AI providers like OpenAI and Anthropic battle over the compute to train and release ever larger, more powerful models, other labs are going in a different direction — pursuing the development of smaller, more efficient models and often open sourcing them. The latest worth paying attention to comes from the lesser-known Palo Alto startup Zyphra, which this week released its new reasoning, mixture-of-experts (MoE) language model, ZAYA1-8B, with just over 8 billion parameters and only 760 million active — far fewer than the trillions estimated for the likes of the big labs. Yet, ZAYA1-8B retains competitive performance on third-party benchmarks against GPT-5-High and DeepSeek-V3.2.It can be downloaded from Hugging Face now free of charge under a permissive, standard, enterprise-friendly Apache 2.0 license — and enterprises and indie developers can begin using and customizing it immediately to suit their needs. Individual users can also test it themselves here free at Zyphra Cloud, the startup’s inference solution.But the real headline is what ZAYA1-8B was trained on: a full stack of AMD Instinct MI300 graphics processing units (GPUs), the rival to Nvidia GPUs released by AMD nearly three years ago, and which shows that this platform is capable of producing useful models and is a viable alternative to the preferential position Nvidia has maintained in recent years among AI model developers. How ZAYA1-8B was trainedThe “intelligence density” touted by Zyphra is the result of what they describe as a “full-stack innovation” approach, spanning architecture, pretraining, and reinforcement learning (RL).ZAYA1-8B is built on Zyphra’s proprietary MoE++ architecture, described in a technical report released by the lab. This architecture introduces three fundamental changes to the standard Transformer architecture that gave rise to large language models (LLMs) and the entire generative AI era:Compressed Convolutional Attention (CCA): Unlike standard attention mechanisms that strug …

Article Attribution | Read More at Article Source

Meet ZAYA1-8B, a super efficient, open reasoning model trained on AMD Instinct MI300 GPUs

About RN

Website Awards

More Info