Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

by News Feed Editor | Aug 28, 2025 | Technology

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

A new training framework developed by researchers at Tencent AI Lab and Washington University in St. Louis enables large language models (LLMs) to improve themselves without requiring any human-labeled data. The technique, called R-Zero, uses reinforcement learning to generate its own training data from scratch, addressing one of the main bottlenecks in creating self-evolving AI systems. R-Zero works by having two independent models co-evolve by interacting with and challenging each other.

Experiments show that R-Zero substantially improves reasoning capabilities across different LLMs, which could lower the complexity and costs of training advanced AI. For enterprises, this approach could accelerate the development of specialized models for complex reasoning tasks without the massive expense of curating labeled datasets.

The challenge of self-evolving LLMs

The idea behind self-evolving LLMs is to create AI systems that can autonomously generate, refine, and learn from their own experiences. This offers a scalable path toward more intelligent and capable AI. However, a major challenge is that training these models requires large volumes of high-quality tasks and labels, which act as supervision signals for the AI to learn from.

Relying on human annotators to create this data is not only costly and slow but also creates a fundamental bottleneck. It effectively limits an AI’s potential capabilities to what humans can teach it. To address this, researchers have developed label-free methods that derive reward signals directly from a model’s own outputs, for example, by measuring its confidence in an answer. While these methods eliminate the need for explicit labels, they still rely on a pre-existing set of tasks, thereby limiting their applicability in truly self-evolving scenarios.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Other approaches involve having models generat …

Article Attribution | Read More at Article Source

Forget data labeling: Tencent’s R-Zero shows how LLMs can train themselves

About RN

Website Awards

More Info