Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
AI is advancing at a rapid clip for businesses, and that’s especially true of speech and voice AI models.
Case in point: Today, ElevenLabs, the well-funded voice and AI sound effects startup founded by former Palantir engineers, debuted Conversational AI 2.0, a significant upgrade to its platform for building advanced voice agents for enterprise use cases, such as customer support, call centers, and outbound sales and marketing.
This update introduces a host of new features designed to create more natural, intelligent, and secure interactions, making it well-suited for enterprise-level applications.
[embedded content]
The launch comes just four months after the debut of the original platform, reflecting ElevenLabs’ commitment to rapid development, and a day after rival voice AI startup Hume launched its own new, turn-based voice AI model, EVI 3.
It also comes after new open source AI voice models hit the scene, prompting some AI influencers to declare ElevenLabs dead. It seems those declarations were, naturally, premature.
According to Jozef Marko from ElevenLabs’ engineering team, Conversational AI 2.0 is substantially better than its predecessor, setting a new standard for voice-driven experiences.
Enhancing naturalistic speech
A key highlight of Conversational AI 2.0 is its state-of-the-art turn-taking model.
This technology is designed to handle the nuances of human conversation, eliminating awkward pauses or interruptions that can occur in traditional voice systems.
By analyzing conversational cues like hesitations and filler words in real-time, the agent can understand when to speak and when to listen.
This feature is particularly relevant for applications such as customer service, where agents must balance quick responses with the natural rhythms of a conversation.
Multilingual support
Conversational AI 2.0 also introduces integrated language detection, enabling seamless multilingual discussions without the need for manual configuration.
This capability ensures that the agent can recognize the language spoken by the user and respond accordingly within the same interaction.
The feature caters to global enterprises seeking consistent service for diverse customer bases, removing language barriers and fostering more inclusive experiences.
Enterprise-grade
One of the more powerful additions is the built-in Retrieval-Augmented Generation (RAG) system. This feature allows the AI to access external knowledge bases and retrieve relevant information instantly, while maintaining minimal latency and strong privacy protections.
For example, in healthcare settings, this means a medical assistant agent can pull up treatment guidelines directly from an institution’s database without delay. In customer support, agents can access up-to-date product details from internal documentation to assis …