Microsoft on Monday unveiled the Surface RTX Spark Dev Box, a compact desktop computer designed to let software developers run large AI models on their desks instead of paying for cloud computing — a move that directly challenges the per-token pricing model that has defined the AI industry’s economics since ChatGPT launched three and a half years ago.The device, announced at Microsoft Build 2026, packs Nvidia’s new Blackwell-architecture RTX Spark processor and 128 gigabytes of unified memory into a small-form-factor chassis, delivering what Nvidia rates at one petaflop of AI compute. In practical terms, that means a developer can load, run and interact with AI models exceeding 120 billion parameters without sending a single API call to the cloud.”These class of devices, we think, will get to about 100 billion parameter model running,” Pavan Davuluri, Microsoft’s executive vice president of Windows and Devices, said during a press briefing ahead of the event. He emphasized that raw model size is only part of the equation: “The model size is one thing, but for the model to be effective, it kind of needs to be able to have enough context, because a larger model, you feed it larger context.” At 100,000 tokens of context, he noted, the key-value cache alone can consume 40 to 50 gigabytes of memory — which is precisely why Microsoft and Nvidia engineered the device around a 128-gigabyte unified memory pool shared dynamically between the CPU and GPU.The machine will be available later this year in the United States, sold exclusively through Microsoft.com. The company did not disclose pricing.Why Microsoft is betting that AI’s future runs on fixed costs, not cloud metersThe Surface RTX Spark Dev Box arrives at a moment when the economics of AI dev …