As LLMs have continued to improve, there has been some discussion in the industry about the continued need for standalone data labeling tools, as LLMs are increasingly able to work with all types of data. HumanSignal, the lead commercial vendor behind the open-source Label Studio program, has a different view. Rather than seeing less demand for data labeling, the company is seeing more. Earlier this month, HumanSignal acquired Erud AI and launched its physical Frontier Data Labs for novel data collection. But creating data is only half the challenge. Today, the company is tackling what comes next: proving the AI systems trained on that data actually work. The new multi-modal agent evaluation capabilities let enterprises validate complex AI agents generating applications, images, code, and video.”If you focus on the enterprise segments, then all of the AI solutions that they’re building still need to be evaluated, which is just another word for data labeling by humans and even more so by experts,” HumanSignal co-founder and CEO Michael Malyuk told VentureBeat in an exclusive interview.The intersection of data labeling and agentic AI evaluationHaving the right data is great, but that’s not the end goal for an enterprise. Where modern data labeling is headed is evaluation.It’s a fundamental shift in what enterprises need validated: not whether their model correctly classified an image, but whether their AI agent made good decisions across a complex, multi-step task involving reasoning, tool usage and code generation.If evaluation is just data labeling for AI outputs, then the shift from models to agents represents a step change in what needs to be labeled. Where traditional data labeling might involve marking images or categorizing text, agent evaluation req …