Pinterest cut AI costs 90% by gutting a frontier model’s vision layer

by | May 29, 2026 | Technology

At 620 million monthly users, calling a frontier model for every image recommendation isn’t a strategy — it’s a bill. Pinterest CTO Matt Madrigal solved it by gutting Qwen3-VL’s vision layer and rebuilding it with proprietary embeddings, cutting costs 90% and boosting accuracy 30%.Madrigal’s team has been heavily investing in customizing open-source models “foundationally in-house.” “If you’ve got really unique data that you can then fine-tune an open source model with, data quality will, frankly, outweigh or overcome model size,” Madrigal explained in a recent VB Beyond the Pilot podcast. How Pinterest customized Qwen for visual discoveryPinterest, which has around 620 million monthly active users, has long applied open source models for visual search and discovery, going back to Google’s BERT and OpenAI’s CLIP. The company fine-tuned its own Pin CLIP on the latter, incorporating proprietary visual embeddings and image metadata. Pinterest’s conversational shopping assistant, Navigator 1, was built on Qwen3-VL and customized in “pretty significant” ways. Madrigal’s team essentially “ripped out” Qwen’s vision encoder layer and fine-tuned the model on proprietary multimodal embeddings. This has allowed them to capture metadata around pins and images that can then be precomputed offline and regularly retrained on new information to deliver personalized experiences. “Open-source models, especially with open Apache licenses where you can truly tweak a lot of open weights and customize for unique use cases — that’s where we’ve found open source to be so powerful for us,” Madrigal said. Bringing their own embeddings allows his team to gain context around metadata, pins, and images; also, notably, the model performs better at runtime and inference. Without these embeddings, devs would have to call and …

Article Attribution | Read More at Article Source