Databricks tested a stronger model against its multi-step agent on hybrid queries. The stronger model still lost by 21%.

by News Feed Editor | Apr 14, 2026 | Technology

Data teams building AI agents keep running into the same failure mode. Questions that require joining structured data with unstructured content, sales figures alongside customer reviews or citation counts alongside academic papers, break single-turn RAG systems. New research from Databricks puts a number on that failure gap. The company’s AI research team tested a multi-step agentic approach against state-of-the-art single-turn RAG baselines across nine enterprise knowledge tasks and reported gains of 20% or more on Stanford’s STaRK benchmark suite, along with consistent improvement across Databricks’ own KARLBench evaluation framework, according to the research. Databricks argues the performance gap between single-turn RAG and multi-step agents on hybrid data tasks is an architectural problem, not a model quality problem.The work builds on Databricks’ earlier instructed retriever research, which showed retrieval improvements on unstructured data using metadata-aware queries. This latest research adds structured data sources, relational tables and SQL warehouses, into the same reasoning loop, addressing the class of questions enterprises most commonly fail to answer with current agent architectures.”RAG works, but it doesn’t scale,” Michael Bendersky, research director at Databricks, told VentureBeat. “If you want to make your agent even better, and you want to understand why you have declining sales, now you have to help the agent see the tables and look at the sales data. Your RAG pipeline will become incompetent at that task.”Single-turn retrieval cannot encode structural constraintsThe core finding is that standard RAG systems fail when a query mixes a precise structured filter with an open-ended semantic search. Consider a question like “Which of our products have had declining sales over the past three months, and what potentially related issues are brought up in customer reviews on various seller sites?” The sales data lives in a warehouse. The review sentiment lives in unstru …

Article Attribution | Read More at Article Source

Databricks tested a stronger model against its multi-step agent on hybrid queries. The stronger model still lost by 21%.

About RN

Website Awards

More Info