There is a lot of enterprise data trapped in PDF documents. To be sure, gen AI tools have been able to ingest and analyze PDFs, but accuracy, time and cost have been less than ideal. New technology from Databricks could change that.The company this week detailed its “ai_parse_document” technology, now integrated with Databricks’ Agent Bricks platform. The technology addresses a critical bottleneck in enterprise AI adoption: Approximately 80% of enterprise knowledge remains locked in PDFs, reports and diagrams that AI systems struggle to accurately process and understand.”It’s a common assumption that parsing PDFs is a solved problem, but in reality, it isn’t,” Erich Elsen, principal research scientist at Databricks, told VentureBeat. “The challenge isn’t just that documents are unstructured; it’s that enterprise PDFs are inherently complex. They mix digital-native content with scanned pages and photos of physical documents, alongside tables, charts and irregular layouts, and most existing tools fail to capture that information accurately.”The hidden complexity behind document parsingWhile optical character recognition (OCR) has existed for decades, Elsen argues that extracting usable, structured data from real-world enterprise documents remains fundamentally unsolved. Key elements such as tables with merged cells, figure captions and spatial relationships between document elements are routinely dropped or misread by existing tools, making downstream AI applications, retrieval-augmented generation (RAG) systems or business intelligence dashboards unreliable.The typical enterprise workaround has been to stack multiple imperfect tools together: One service for layout detection, another for OCR, a third for table extraction, as well as additional APIs for figure analysis. This approach requires months of custom data engineering and ongoing maintenance as document formats evolve.”To compensate, teams have had to stack multiple imperfect tools or build extensive custom pipelines, spending months on data engineering instead of innovation,” Elsen said. ” …