Picking the Perfect Partner: A Guide to Choosing the Best Embedding Models in Ollama
Z
Zack Saadioui
8/12/2025
Picking the Perfect Partner: A Guide to Choosing the Best Embedding Models in Ollama
What’s up, everyone! Let's talk about something that’s been a game-changer for me & a lot of other developers building with LLMs: embedding models. Specifically, how to choose the right one when you're working with Ollama. Honestly, it can feel a bit like picking a starter Pokémon – they all look cool, but the one you choose will have a BIG impact on your journey.
So, here's the thing. You've decided to build a RAG (Retrieval-Augmented Generation) application. You want your LLM to be able to answer questions about your own documents, your company's knowledge base, or maybe even your personal notes. To do that, you need a way to convert all that text into a format that a computer can understand & compare. That's where embedding models come in. They take your text & turn it into a bunch of numbers (a vector) that captures its semantic meaning.
Ollama has made it ridiculously easy to get up & running with open-source LLMs locally, & that includes a bunch of awesome embedding models. But with so many options, how do you pick the best one for your project? That’s what we’re going to dive into today. We’ll look at some of the most popular models, how they stack up against each other, & how to get them working in your own projects.
What’s the Big Deal with Embedding Models Anyway?
Before we get into the nitty-gritty of comparing models, let's quickly recap why embedding models are so darn important. At their core, they solve a fundamental problem: LLMs only know what they were trained on. If you want them to be able to answer questions about your private documents or data that's more recent than their last training run, you're out of luck.
That's where RAG comes in. By using an embedding model to create a searchable vector database of your own documents, you can find the most relevant information for a user's query & then feed that information to your LLM as context. This allows your LLM to generate a much more accurate & relevant response. It's a pretty elegant solution to a tricky problem, & it's all powered by these amazing little embedding models.
The Contenders: Popular Embedding Models in Ollama
Ollama offers a growing library of embedding models, but a few have emerged as the most popular & widely used. Let's take a look at some of the top contenders:
1
mxbai-embed-large
: This is a powerful, all-around embedding model that has gained a lot of popularity for its strong performance. It's a great choice for a wide range of tasks & has been shown to outperform some of the big proprietary models like OpenAI's
1
text-embedding-3-large
in certain benchmarks.
1
nomic-embed-text
: This model has been making waves for its excellent performance, especially on long-context tasks. It's a really solid choice if you're working with large documents or need to capture the nuances of complex information.
1
all-minilm
: This is a more lightweight model, which makes it a great option if you're working with limited hardware. While it might not be as powerful as some of the larger models, it's surprisingly capable & a great starting point for many projects.
1
bge-m3
: This is another powerhouse model that has consistently ranked at the top of the MTEB (Massive Text Embedding Benchmark) leaderboard. It's a multilingual model with a long context window, making it incredibly versatile.
The Showdown: How Do These Models Compare?
Okay, so we have our contenders. Now for the million-dollar question: which one is the best? The answer, as you might have guessed, is... it depends. The best model for you will depend on a few key factors:
Your specific use case: Are you building a semantic search engine for short product descriptions? Or a question-answering system for long, technical documents? Different models excel at different tasks.
Your hardware: Some of these models are pretty resource-intensive. If you're running on a laptop with limited RAM, you'll want to choose a smaller, more efficient model.
Your desired level of accuracy: If you need the absolute highest level of accuracy, you'll probably want to go with one of the top-performing models on the MTEB leaderboard. But if "good enough" is good enough, you might be able to get away with a smaller, faster model.
To give you a better idea of how these models stack up, let's look at some of the data from the MTEB leaderboard. The MTEB is a comprehensive benchmark that evaluates embedding models on a wide range of tasks, including classification, clustering, retrieval, & more.
While it's tough to find a single, definitive leaderboard that includes all of these models in a perfect side-by-side comparison, we can piece together a pretty good picture from various sources. Here's a general overview of how these models tend to perform:
Model
Strengths
Weaknesses
Best For
1
bge-m3
Top-tier performance on MTEB, multilingual, long context window
Can be resource-intensive
Projects that require the absolute best performance & versatility
1
mxbai-embed-large
Excellent all-around performance, often outperforming proprietary models
Can be resource-intensive
A wide range of tasks, from semantic search to RAG
1
nomic-embed-text
Strong performance, especially on long-context tasks
Might not be the absolute best on all benchmarks
RAG applications with long, complex documents
1
all-minilm
Lightweight & efficient
Not as powerful as the larger models
Projects with limited hardware resources or where speed is a priority
It's also worth noting that new & improved models are being released all the time. The world of open-source AI moves FAST, so it's always a good idea to keep an eye on the latest MTEB leaderboards & community discussions to see what's new & exciting.
Getting Your Hands Dirty: A Simple RAG Tutorial with Ollama
Okay, enough talk. Let's get to the fun part: actually building something! Here's a quick & easy tutorial on how to build a simple RAG application using Ollama & Python. We'll use
1
mxbai-embed-large
for this example, but you can easily swap it out for any of the other models we've discussed.
Step 1: Install Ollama & Pull the Models
First things first, you'll need to install Ollama on your machine. You can find the installation instructions on the Ollama website. Once you have Ollama installed, you'll need to pull the embedding model & a language model. We'll use