Meituan open sources LongCat-2.0, the 1.6T, near-frontier agentic coding model that’s been leading OpenRouter — trained entirely on Chinese chips

by | Jun 30, 2026 | Technology

A few hours ago, Chinese delivery app company Meituan officially unveiled LongCat-2.0 on GitHub, Hugging Face, and its native platform, unmasking the model as the computational engine behind “Owl Alpha,” the anonymous stealth model that has spent the last two months commanding global developer charts on OpenRouter. Developed to fundamentally disrupt closed-source enterprise dominance in autonomous software engineering, the 1.6-trillion-parameter Mixture-of-Experts (MoE) system brings a native 1-million-token context window to the public domain under a highly permissive, enterprise grade, commercially viable MIT license. Commercial access to the architecture introduces a highly aggressive pricing tier, deploying a mechanism where all context-cache hits are processed completely free of charge, running alongside a time-limited “Token Pack” flash-sale paradigm. There’s also a typical “pay-as-you-go” API for non-cache hits standard priced at $0.75/$2.95 per million tokens in/out.However, a limited-time promotional discount aggressively slashes these operational expenditures down to $0.30 per million tokens for uncached input and $1.20 per million tokens for output, both on the cheaper-end of top performing models globally. ModelInput ($/1M)Output ($/1M)Total ($/1M)SourceMiMo-V2.5 Flash$0.10$0.30$0.40Xiaomideepseek-v4-flash$0.14$0.28$0.42DeepSeekdeepseek-v4-pro$0.435$0.87$1.305DeepSeekMiniMax-M3$0.30$1.20$1.50MiniMaxLongCat-2.0 — limited-time promo$0.30$1.20$1.50LongCatGemini 3.1 Flash-Lite$0.25$1.50$1.75GoogleQwen3.7-Plus$0.40$1.60$2.00Alibaba CloudMiMo-V2.5$0.40$2.00$2.40XiaomiLongCat-2.0 — standard$0.75$2.95$3.70LongCatGrok 4.3 (low context)$1.25$2.50$3.75xAIMiMo-V2.5 Pro (≤256K)$1.00$3.00$4.00XiaomiKimi-K2.6$0.95$4.00$4.95Moonshot AIGLM-5.2$1.40$4.40$5.80Z.aiGPT-5.6 Luna$1.00$6.00$7.00OpenAIGrok 4.3 (high context)$2.50$5.00$7.50xAIMiMo-V2.5 Pro (>256K)$2.00$6.00$8.00XiaomiQwen3.7-Max$2.50$7.50$10.00Alibaba CloudGemini 3.5 Flash$1.50$9.00$10.50GoogleGemini 3.1 Pro Preview (≤200K)$2.00$12.00$14.00GoogleGPT-5.6 Terra$2.50$15.00$17.50OpenAIGPT-5.4$2.50$15.00$17.50OpenAIGemini 3.1 Pro Preview (>200K)$4.00$18.00$22.00GoogleClaude Opus 4.8$5.00$25.00$30.00AnthropicGPT-5.5$5.00$30.00$35.00OpenAIGPT-5.5 Instant (chat-latest)$5.00$30.00$35.00OpenAISakana Fugu Ultra (≤272K)$5.00$30.00$35.00Sakana AIGPT-5.6 Sol$5.00$30.00$35.00OpenAIClaude Fable 5 / Claude Mythos 5$10.00$50.00$60.00AnthropicWhat makes the release a definitive inflection point for global tech infrastructure is its operational independence: the massive model was trained entirely on a cluster of over 50,000 domestic Chinese Application-Specific Integrated Circuits (ASICs), proving that near-frontier AI models can be scaled successfully without relying on the typical U.S. Nvidia GPUs that have, to date, powered much of the global generative AI frontier model training effort. This successful deployment of alternative silicon signals a profound structural shift. If Chinese conglomerates can consistently iterate trillion-parameter architectures using homegrown ASICs rather than general-purpose GPUs, it would seem to threaten Nvidia’s dominance in this sector. Crucially, this technological pivot arrives precisely as Washington pressures top-tier American labs to restrict access to their latest models. Following a U.S. governmental request, OpenAI was forced to limit access to its new GPT-5.6 models, while Anthropic was previously also ordered by the U.S. to restrict access to its latest Claude Fable 5 / Mythos 5 models, which it took entirely offline in response. At the same time, a growing chorus of technologists, activists, and industry experts warn that these defensive regulatory maneuvers have inadvertently backfired. By locking down Western closed-source models and driving up API costs, the U.S. government has left a wide operational window for global developers seeking affordable, high-performance alternatives like those found in Chinese open source models such as Meituan LongCat-2.0.The raw operational metrics backed up the developer enthusiasm: during its unbranded residency on OpenRouter, Owl Alpha accounted for approximately 10.1 trillion monthly tokens—averaging 559 billion tokens per day—representing a 242% month-over-month explosion in volume that propelled it into the platform’s global top three.By the time Meituan stepped forward to claim the architecture, the model had already secured the top ranking on the Hermes Agent workspace, second place on Claude Code deployments, and third place across international OpenClaw environments.Technology: Engineering the 1M-Token Sparse ContextAt the core of LongCat-2.0 lies an aggressive optimization of Mixture-of-Experts (MoE) sparsity, scaling total parameters to 1.6 trillion while limiting active computation to an average of 48 billion parameters per token.Depending on the structural complexity of a query, the model’s dynamic activation ranges from 33 billion to 56 billion parameters. This design implements a “Zero-Compute Experts” framework, ensuring that routine execution elements pass through lighter subnetworks, entirely eliminating the idle computational overhead that typically penalizes ultra-dense models.To sustain a functional 1-million-token context window without incurring catastrophic hardware bottlenecks, Meituan introduced LongCat Sparse Attention (LSA). Designed as an evolutionary iteration of DeepSeek Sparse Attention, LSA resolves the quadratic scoring costs and memory fragmentation that typically plague fine-grained sparse mechanisms through three distinct, orthogonal vectors:Streaming-aware Indexing (SI): This system restructures the token selection pipeline by blending hardware-aligned contiguous data reads with dynamic random selection. By converting fragmented memory access into highly predictable, sequential blocks, the system achieves coalesced High Bandwidth Memory (HBM) utilization and elevated effective bandwidth.Cross-Layer Indexing (CLI): Leveraging the empirical reality that attention saliency remains highly stable across adjacent hidden layers, CLI amortizes calculation costs. A single indexing pass successfully guides multiple consecutive layers during inference, a capability reinforced by cross-layer distillation throughout the training phase.Hierarchical Indexing (HI): This approach applies a coarse-to-fine, two-stage scoring layout. The indexer performs a rapid, approximate block-level recall to filter candidates, before running fine-grained token selection exclusively on the remaining population.Furthermore, Meituan integrated an N-gram Embedding module inherited from its lighter model lines. By expanding parameter allocation in sparse dimensions completely orthogonal to the MoE expert layout, the architecture appends 135 billion parameters to a 5-gram token combination framework. This expands the core embedding space by roughly 100-fold, allowing the model to capture dense local token relationships and accelerate large-batch inference operations by reducing memory Input/Output (I/O) bottlenecks.Product: Post-Training, MOPD Framework and Benchmark PerformanceWhile generalist large language models prioritize fluid, conversational interfaces, LongCat-2.0 focuses explicitly on multi-step engineering tasks, tool integration, and automated repository manipulation — agentic tasks, in other words. In standardized …

Article Attribution | Read More at Article Source