Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more
While enterprises face the challenges of deploying AI agents in critical applications, a new, more pragmatic model is emerging that puts humans back in control as a strategic safeguard against AI failure.
One such example is Mixus, a platform that uses a “colleague-in-the-loop” approach to make AI agents reliable for mission-critical work.
This approach is a response to the growing evidence that fully autonomous agents are a high-stakes gamble.
The high cost of unchecked AI
The problem of AI hallucinations has become a tangible risk as companies explore AI applications. In a recent incident, the AI-powered code editor Cursor saw its own support bot invent a fake policy restricting subscriptions, sparking a wave of public customer cancellations.
Similarly, the fintech company Klarna famously reversed course on replacing customer service agents with AI after admitting the move resulted in lower quality. In a more alarming case, New York City’s AI-powered business chatbot advised entrepreneurs to engage in illegal practices, highlighting the catastrophic compliance risks of unmonitored agents.
These incidents are symptoms of a larger capability gap. According to a May 2025 Salesforce research paper, today’s leading agents succeed only 58% of the time on single-step tasks and just 35% of the time on multi-step ones, highlighting “a significant gap between current LLM capabilities and the multifaceted demands of real-world enterprise scenarios.”
The colleague-in-the-loop model
To bridge this gap, a new approach focuses on structured human oversight. “An AI agent should act at your direction and on your behalf,” Mixus co-founder Elliot Katz told VentureBeat. “But without built-in organizational oversight, fully autonomous agents often create more problems than they solve.”
This philosophy underpins Mixus’s colleague-in-the-loop model, which embeds human verification directly into automated workflows. For example, a large retailer might receive weekly reports from thousands of stores that contain critical operational data (e.g., sales volumes, labor hours, productivity ratios, compensation requests from headquarters). Human analysts must spend hours manually reviewing the data and making decisions based on heuristics. With Mixus, the AI agent automates the heavy lifting, analyzing complex patterns and flagging anomalies like unusually high salary requests or productivity outliers.
For high-stakes decisions like payment authorizations or policy violations — workflows defined by a human user as “high-risk” — the agent pauses and requires human approval before proceeding. The …