Model minimalism: The new AI strategy saving companies millions

by | Jun 27, 2025 | Technology

This article is part of VentureBeat’s special issue, “The Real Cost of AI: Performance, Efficiency and ROI at Scale.” Read more from this special issue.

The advent of large language models (LLMs) has made it easier for enterprises to envision the kinds of projects they can undertake, leading to a surge in pilot programs now transitioning to deployment. 

However, as these projects gained momentum, enterprises realized that the earlier LLMs they had used were unwieldy and, worse, expensive. 

Enter small language models and distillation. Models like Google’s Gemma family, Microsoft’s Phi and Mistral’s Small 3.1 allowed businesses to choose fast, accurate models that work for specific tasks. Enterprises can opt for a smaller model for particular use cases, allowing them to lower the cost of running their AI applications and potentially achieve a better return on investment. 

LinkedIn distinguished engineer Karthik Ramgopal told VentureBeat that companies opt for smaller models for a few reasons. 

“Smaller models require less compute, memory and faster inference times, which translates directly into lower infrastructure OPEX (operational expenditures) and CAPEX (capital expenditures) given GPU costs, availability and power requirements,” Ramgoapl said. “Task-specific models have a narrower scope, making their behavior more aligned and maintainable over time without complex prompt engineering.”

Model developers price their small models accordingly. OpenAI’s o4-mini costs $1.1 per million tokens for inputs and $4.4/million tokens for outputs, compared to the full o3 version at $10 for inputs and $40 for outputs. 

Enterprises today have a larger pool of small models, task-specific models and distilled models to choose from. These days, most flagship models offer a range of sizes. For example, the Claude family of models from Anthropic comprises Claude Opus, the largest model, Claude Sonnet, the all-purpose model, and Claude Haiku, the smallest version. These models are compact enough to operate on portable devices, such as laptops or mobile phones. 

The savings question

When discussing return on investment, though, the question is always: What does ROI look like? Should it be a return on the costs incurred or the time savings that ultimately means dollars saved down the line? Experts VentureBeat spoke to said ROI can be difficult to judge because some companies believe they’ve already reached ROI by cutting time spent on a task while others are waiting for actual dollars saved or more business brought in to say if AI investments have actually worked.

Article Attribution | Read More at Article Source