The crispy, lightweight ColBERT family from Mixedbread.
🍞 Looking for a simple end-to-end retrieval solution? Meet Mixedbread Search, our multi-modal and multi-lingual search solution.
mxbai-edge-colbert-v0-32m
This model is a lightweight, 32 million parameter ColBERT with a projection dimension of 64. It is built on top of Ettin-32M, meaning it benefits from all of ModernBERT's architectural efficiencies. Despite this extreme efficiency, it is the best-performer "edge-sized" retriever, outperforming ColBERTv2 and many models with over 10 times more parameters. It can create multi-vector representations for documents of up to 32,000 tokens and is fully compatible with the PyLate library.
Usage
To use this model, you first need to install PyLate:
via uv
# uv
uv add pylate
# uv + pip
uv pip install pylate
or pip
# pip
pip install -U pylate
Once installed, the model is immediately ready to use to generate representations and index documents:
from pylate import indexes, models, retrieve
# Step 1: Load the model
model = models.ColBERT(
model_name_or_path="mixedbread-ai/mxbai-edge-colbert-v0-32m",
)
# Step 2: Initialize an index (here, PLAID, for larger document collections)
index = indexes.PLAID(
index_folder="pylate-index",
index_name="index",
override=True, # This overwrites the existing index if any
)
# Step 3: Encode your documents
documents_ids = ["1", "2", "3"]
documents = ["document 1 text", "document 2 text", "document 3 text"]
documents_embeddings = model.encode(
documents,
batch_size=32,
is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries
show_progress_bar=True,
)
# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
documents_ids=documents_ids,
documents_embeddings=documents_embeddings,
)
That's all you need to do to encode a full collection! Your documents are indexed and ready to be queried:
# Step 5.1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)
# Step 2: Encode the queries
queries_embeddings = model.encode(
["query for document 3", "query for document 1"],
batch_size=32,
is_query=True, # # Ensure that it is set to False to indicate that these are queries
show_progress_bar=True,
)
# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
queries_embeddings=queries_embeddings,
k=10, # Retrieve the top 10 matches for each query
)
Reranking
Thanks to its extreme parameter efficiency, this model is particularly well-suited to being used as a re-ranker following an even more lightweight first stage retrieval, such as static embeding models. Re-ranking is just as straigthforward:
from pylate import rank, models
# Load the model
model = models.ColBERT(
model_name_or_path="mixedbread-ai/mxbai-edge-colbert-v0-32m",
)
# Define queries and documents
queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]
# Embed them
queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)
# Perform reranking
reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
Evaluation
Results on BEIR
Model | AVG | MS MARCO | SciFact | Touche | FiQA | TREC-COVID | NQ | DBPedia |
---|---|---|---|---|---|---|---|---|
Large Models (>100M) | ||||||||
GTE-ModernColBERT-v1 | 0.547 | 0.453 | 0.763 | 0.312 | 0.453 | 0.836 | 0.618 | 0.480 |
ColBERTv2 | 0.488 | 0.456 | 0.693 | 0.263 | 0.356 | 0.733 | 0.562 | 0.446 |
Medium Models (<35M) | ||||||||
mxbai-edge-colbert-v0-32m | 0.521 | 0.450 | 0.740 | 0.313 | 0.390 | 0.775 | 0.600 | 0.455 |
answerai-colbert-small-v1 | 0.534 | 0.434 | 0.740 | 0.250 | 0.410 | 0.831 | 0.594 | 0.464 |
bge-small-en-v1.5 | 0.517 | 0.408 | 0.713 | 0.260 | 0.403 | 0.759 | 0.502 | 0.400 |
snowflake-s | 0.519 | 0.402 | 0.722 | 0.235 | 0.407 | 0.801 | 0.509 | 0.410 |
Small Models (<25M) | ||||||||
mxbai-edge-colbert-v0-17m | 0.490 | 0.416 | 0.719 | 0.316 | 0.326 | 0.713 | 0.551 | 0.410 |
colbert-muvera-micro | 0.394 | 0.364 | 0.662 | 0.251 | 0.254 | 0.561 | 0.386 | 0.332 |
all-MiniLM-L6-v2 | 0.419 | 0.365 | 0.645 | 0.169 | 0.369 | 0.472 | 0.439 | 0.323 |
Results on LongEmbed
Model | AVG |
---|---|
Large Models (>100M) | |
GTE-ModernColBERT-v1 (32k) | 0.898 |
GTE-ModernColBERT-v1 (4k) | 0.809 |
granite-embedding-english-r2 | 0.656 |
ColBERTv2 | 0.428 |
Medium Models (<50M) | |
mxbai-edge-colbert-v0-32m (32k) | 0.849 |
mxbai-edge-colbert-v0-32m (4k) | 0.783 |
granite-embedding-small-english-r2 | 0.637 |
answerai-colbert-small-v1 | 0.441 |
bge-small-en-v1.5 | 0.312 |
snowflake-arctic-embed-s | 0.356 |
Small Models (<25M) | |
mxbai-edge-colbert-v0-17m (32k) | 0.847 |
mxbai-edge-colbert-v0-17m (4k) | 0.776 |
all-MiniLM-L6-v2 | 0.298 |
colbert-muvera-micro | 0.405 |
For more details on evaluations, please read our Tech Report.
Community
Please join our Discord Community and share your feedback and thoughts! We are here to help and also always happy to chat.
License
Apache 2.0
Citation
If you use our model, please cite the associated tech report:
@misc{takehi2025fantasticsmallretrieverstrain,
title={Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report},
author={Rikiya Takehi and Benjamin Clavié and Sean Lee and Aamir Shakir},
year={2025},
eprint={2510.14880},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2510.14880},
}
If you specifically use its projection heads, or discuss their effect, please cite our report on using different projections for ColBERT models:
@misc{clavie2025simpleprojectionvariantsimprove,
title={Simple Projection Variants Improve ColBERT Performance},
author={Benjamin Clavié and Sean Lee and Rikiya Takehi and Aamir Shakir and Makoto P. Kato},
year={2025},
eprint={2510.12327},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2510.12327},
}
Finally, if you use PyLate in your work, please cite PyLate itself:
@misc{PyLate,
title={PyLate: Flexible Training and Retrieval for Late Interaction Models},
author={Chaffin, Antoine and Sourty, Raphaël},
url={https://github.com/lightonai/pylate},
year={2024}
}
- Downloads last month
- 1,880