New embedding model leaderboard shakeup: Google takes #1 while Alibaba’s open source alternative closes gap

by | Jul 18, 2025 | Technology

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Google has officially moved its new, high-performance Gemini Embedding model to general availability, currently ranking number one overall on the highly regarded Massive Text Embedding Benchmark (MTEB). The model (gemini-embedding-001) is now a core part of the Gemini API and Vertex AI, enabling developers to build applications such as semantic search and retrieval-augmented generation (RAG).

While a number-one ranking is a strong debut, the landscape of embedding models is very competitive. Google’s proprietary model is being challenged directly by powerful open-source alternatives. This sets up a new strategic choice for enterprises: adopt the top-ranked proprietary model or a nearly-as-good open-source challenger that offers more control.

What’s under the hood of Google’s Gemini embedding model

At their core, embeddings convert text (or other data types) into numerical lists that capture the key features of the input. Data with similar semantic meaning have embedding values that are closer together in this numerical space. This allows for powerful applications that go far beyond simple keyword matching, such as building intelligent retrieval-augmented generation (RAG) systems that feed relevant information to LLMs. 

Embeddings can also be applied to other modalities such as images, video and audio. For instance, an e-commerce company might utilize a multimodal embedding model to generate a unified numerical representation for a product that incorporates both textual descriptions and images.

The AI Impact Series Returns to San Francisco – August 5

The next phase of AI is here – are you ready? Join leaders from Block, GSK, and SAP for an exclusive look at how autonomous agents are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Secure your spot now – space is limited: https://bit.ly/3GuuPLF

For enterprises, embedding models can power more accurate internal search engines, sophisticated document clustering, classification tasks, sentiment analysis and anomaly detection. Embeddings are also becoming an important part of agentic applications, where AI agents must retrieve and match different types of documents and prompts.

One of the key features of Gemini Embedding is its built-in flexibility. It has been trained through a technique known as Matryoshka Representation Learning (MRL …

Article Attribution | Read More at Article Source