The enterprise voice AI market is in the middle of a land grab. ElevenLabs and IBM announced a collaboration just this week to bring premium voice capabilities into IBM’s watsonx Orchestrate platform. Google Cloud has been expanding its Chirp 3 HD voices. OpenAI continues to iterate on its own speech synthesis. And the market underpinning all of this activity is enormous — voice AI crossed $22 billion globally in 2026, with the voice AI agents segment alone projected to reach $47.5 billion by 2034, according to industry estimates.On Thursday morning, Mistral AI entered that fight with a fundamentally different proposition. The Paris-based AI startup released Voxtral TTS, what it calls the first frontier-quality, open-weight text-to-speech model designed specifically for enterprise use. Where every major competitor in the space operates a proprietary, API-first business — enterprises rent the voice, they don’t own it — Mistral is releasing the full model weights, inviting companies to download Voxtral TTS, run it on their own servers or even on a smartphone, and never send a single audio frame to a third party.It is a bet that the future of enterprise voice AI will not be shaped by whoever builds the best-sounding model, but by whoever gives companies the most control over it. And it arrives at a moment when Mistral, valued at $13.8 billion after a $2 billion Series C round led by Dutch chipmaker ASML last September, has been aggressively assembling the building blocks of a complete, enterprise-owned AI stack — from its Forge customization platform announced at Nvidia GTC earlier this month, to its AI Studio production infrastructure, to the Voxtral Transcribe speech-to-text model released just weeks ago.Voxtral TTS is the output layer that completes that picture, giving enterprises a speech-to-speech pipeline they can run end-to-end without relying on any external provider.”We see audio as a big bet and as a critical and maybe the only future interface with all the AI models,” Pierre Stock, Mistral’s vice president of science and the first employee hired at the company, said in an exclusive interview with VentureBeat. “This is something customers have been asking for.”A 3-billion-parameter model that fits on a laptop and runs six times faster than real-time speechThe technical specifications of Voxtral TTS read like a deliberate inversion of industry norms. Where most frontier TTS models are large and resource-intensive, Mistral built its model to be roughly three times sm …