When Liquid AI, a startup founded by MIT computer scientists back in 2023, introduced its Liquid Foundation Models series 2 (LFM2) in July 2025, the pitch was straightforward: deliver the fastest on-device foundation models on the market using the new “liquid” architecture, with training and inference efficiency that made small models a serious alternative to cloud-only large language models (LLMs) such as OpenAI’s GPT series and Google’s Gemini. The initial release shipped dense checkpoints at 350M, 700M, and 1.2B parameters, a hybrid architecture heavily weighted toward gated short convolutions, and benchmark numbers that placed LFM2 ahead of similarly sized competitors like Qwen3, Llama 3.2, and Gemma 3 on both quality and CPU throughput. The message to enterprises was clear: real-time, privacy-preserving AI on phones, laptops, and vehicles no longer required sacrificing capability for latency.In the months since that launch, Liquid has expanded LFM2 into a broader product line — adding task-and-domain-specialized variants, a small video ingestion and analysis model, and an edge-focused deployment stack called LEAP — and positioned the models as the control layer for on-device and on-prem agentic systems. Now, with the publication of the detailed, 51-page LFM2 technical report on arXiv, the company is going a step further: making public the architecture search process, training data mixture, distillation objective, curriculum strategy, and post-training pipeline behind those models. And unlike earlier open models, LFM2 is built around a repeatable recipe: a hardware-in-the-loop search process, a training curriculum that compensates for smaller parameter budgets, and a post-training pipeline tuned for instruction following and tool use. Rather than just offering weights and an API, Liquid is effectively publishing a detailed blueprint that other organizations can use as a reference for training their own small, efficient models from scratch, tuned to their own hardware and deployment constraints.A model family designed around real constraints, not GPU labsThe technical report begins with a premise enterprises are intimately familiar with: real AI systems hit limits long before benchmarks do. Latency budgets, peak memory ceilings, and thermal throttling define what can actually run in production—especially on laptops, tablets, com …