Less than a week after completing the largest tech IPO of 2026, Cerebras Systems is making its most aggressive play yet to dominate the fast-growing AI inference market. On Monday, the Sunnyvale-based chipmaker announced that it is now running Kimi K2.6 — a trillion-parameter open-weight model developed by Beijing-based Moonshot AI — for enterprise customers at nearly 1,000 tokens per second, a speed no GPU-based provider has come close to matching.The result, independently verified by benchmarking firm Artificial Analysis, clocked in at 981 output tokens per second, making Cerebras 6.7 times faster than the next-fastest GPU-based cloud provider and 23 times faster than the median. For a standard agentic coding request involving 10,000 input tokens, Cerebras delivered the full response — including prompt processing, reasoning, and 500 output tokens — in 5.6 seconds, compared to 163.7 seconds on the official Kimi endpoint. That’s a 29-fold improvement in time to final answer.”We’re really wanting to be very clear and show that we can do the largest models,” James Wang, Cerebras’ director of product marketing, told VentureBeat in an exclusive interview ahead of the announcement. “In this case, Kimi K2.6 — a trillion-parameter MoE model on the wafer-scale architecture — and it runs also at this same incredible speed that we’re famous for.”The announcement marks a critical inflection point for Cerebras, which has long battled a perception that its unorthodox wafer-scale chips, while blindingly fast, could only handle small and mid-sized models. Kimi K2.6 is the first trillion-parameter open-weight model the company has ever served in production. And with a freshly minted $95 billion market cap and $5.55 billion in IPO proceeds burning a hole in its balance sheet, Cerebras is signaling to Wall Street that it intends to compet …