ElevenLabs co-founder and CEO Mati Staniszewski says voice is becoming the next major interface for AI – the way people will increasingly interact with machines as models move beyond text and screens.
Speaking at Web Summit in Doha, Staniszewski told TechCrunch voice models like those developed by ElevenLabs have recently moved beyond simply mimicking human speech — including emotion and intonation – to working in tandem with the reasoning capabilities of large language models. The result, he argued, is a shift in how people interact with technology.
In the years ahead, he said, “hopefully all our phones will go back in our pockets, and we can immerse ourselves in the real world around us, with voice as the mechanism that controls technology.”
That vision fueled ElevenLabs’s $500 million raise this week at an $11 billion valuation, and it is increasingly shared across the AI industry. OpenAI and Google have both made voice a central focus of their next-generation models, while Apple appears to be quietly building voice-adjacent, always-on technologies through acquisitions like Q.ai. As AI spreads into wearables, cars, and other new hardware, control is becoming less about tapping screens and more about speaking, making voice a key battleground for the next phase of AI development.
Iconiq Capital general partner Seth Pierrepont echoed that view onstage at Web Summit, arguing that while screens will continue to matter for gaming and entertainment, traditional input methods like keyboards are starting to feel “outdated.”
And as AI systems become more agentic, Pierrepont said, the interaction itself will also change, with models gaining guardrails, integrations, and context needed to respond with less explicit prompting from users.
Staniszewski pointed to that agentic shift as one of the biggest changes underway. Rather than spelling out every instruction, he said future voice systems will increasingly rely on persistent memory and context built up over time, making interactions feel more natural and requiring less effort from users.
Techcrunch event
Boston, MA
|
June 23, 2026
That evolution, he added, will influence how voice models are deployed. While high-quality audio models have largel …