Google debuts AI chips with 4X performance boost, secures Anthropic megadeal worth billions

by | Nov 6, 2025 | Technology

Google Cloud is introducing what it calls its most powerful artificial intelligence infrastructure to date, unveiling a seventh-generation Tensor Processing Unit and expanded Arm-based computing options designed to meet surging demand for AI model deployment — what the company characterizes as a fundamental industry shift from training models to serving them to billions of users.The announcement, made Thursday, centers on Ironwood, Google’s latest custom AI accelerator chip, which will become generally available in the coming weeks. In a striking validation of the technology, Anthropic, the AI safety company behind the Claude family of models, disclosed plans to access up to one million of these TPU chips — a commitment worth tens of billions of dollars and among the largest known AI infrastructure deals to date.The move underscores an intensifying competition among cloud providers to control the infrastructure layer powering artificial intelligence, even as questions mount about whether the industry can sustain its current pace of capital expenditure. Google’s approach — building custom silicon rather than relying solely on Nvidia’s dominant GPU chips — amounts to a long-term bet that vertical integration from chip design through software will deliver superior economics and performance.Why companies are racing to serve AI models, not just train themGoogle executives framed the announcements around what they call “the age of inference” — a transition point where companies shift resources from training frontier AI models to deploying them in production applications serving millions or billions of requests daily.”Today’s frontier models, including Google’s Gemini, Veo, and Imagen and Anthropic’s Claude train and serve on Tensor Processing Units,” said Amin Vahdat, vice president and general manager of AI and Infrastructure at Google Cloud. “For many organizations, the focus is shifting from training these models to powering useful, responsive interactions with them.”This transition has profound implications for infrastructure requirements. Where training workloads can often tolerate batch processing and longer completion times, inference — the process of actually running a trained model to generate responses — demands consistently low latency, high throughput, and unwavering reliability. A chatbot that takes 30 seconds to respond, or a coding assistant that frequently times out, becomes unusable regardless of the …

Article Attribution | Read More at Article Source