When enterprises fine-tune LLMs for new tasks, they risk breaking everything the models already know. This forces companies to maintain separate models for every skill.Researchers at MIT, the Improbable AI Lab and ETH Zurich have developed a new technique that enables large language models to learn new skills and knowledge without forgetting their past capabilities.Their technique, called self-distillation fine-tuning (SDFT), allows models to learn directly from demonstrations and their own experiments by leveraging the inherent in-context learning abilities of modern LLMs. Experiments show that SDFT consistently outperforms traditional supervised fine-tuning (SFT) while addressing the limitations of reinforcement learning algorithms.For enterprise applications, the method enables a single model to accumulate multiple skills over time without suffering from performance regression on earlier tasks. This offers a potential pathway for building AI agents that can adapt to dynamic business environments, gathering new proprietary knowledge and skills as needed without requiring expensive retraining cycles or losing their general reasoning abilities.The challenge of continual learningOnce an LLM is trained and deployed, it remains static. It does not update its parameters to acquire new skills, internalize new knowledge, or improve from experience. To build truly adaptive AI, the industry needs to solve “continual learning,” allowing systems to accumulate knowledge much like humans do throughout their careers.The most effective way for models to learn is through “on-policy learning.” In this approach, the model learns from data it generates itself allowing it to correct its own errors and reasoning processes. This stands in contrast to learning by simply mimicking static datasets. Without on-policy learning, models are prone to “catastrophic forgetting,” a phenomenon where learning a new task causes the model to lose its past knowledge and ability to perform previo …