---
language:
- en
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- transformers
pipeline_tag: sentence-similarity
library_name: PyLate
metrics:
- MaxSim_accuracy@1
- MaxSim_accuracy@3
- MaxSim_accuracy@5
- MaxSim_accuracy@10
- MaxSim_precision@1
- MaxSim_precision@3
- MaxSim_precision@5
- MaxSim_precision@10
- MaxSim_recall@1
- MaxSim_recall@3
- MaxSim_recall@5
- MaxSim_recall@10
- MaxSim_ndcg@10
- MaxSim_mrr@10
- MaxSim_map@100
model-index:
- name: mxbai-edge-colbert-v0-17m
results:
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoClimateFEVER
type: NanoClimateFEVER
metrics:
- type: MaxSim_accuracy@1
value: 0.28
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.4
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.52
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.76
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.28
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.15333333333333332
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.132
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.114
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.13166666666666665
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.19566666666666666
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.26899999999999996
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.4323333333333333
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.32110454808344563
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.3874603174603174
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.24386041506572398
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoDBPedia
type: NanoDBPedia
metrics:
- type: MaxSim_accuracy@1
value: 0.78
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.92
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.94
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.98
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.78
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.6466666666666666
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.6000000000000001
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.53
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.11231081441624795
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.19718498662932682
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.2515039585287889
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.3827585204510568
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.6668364782038155
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.8583333333333334
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.532138470469583
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoFEVER
type: NanoFEVER
metrics:
- type: MaxSim_accuracy@1
value: 0.86
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.94
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.98
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 1
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.86
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.3399999999999999
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.21199999999999997
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.10999999999999999
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.7966666666666667
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.91
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.95
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.98
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.9113009102891444
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.9095238095238095
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.8804077380952381
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoFiQA2018
type: NanoFiQA2018
metrics:
- type: MaxSim_accuracy@1
value: 0.5
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.64
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.66
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.78
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.5
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.3
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.22399999999999998
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.13599999999999998
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.28257936507936504
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.45084920634920633
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.5012857142857142
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.5930079365079366
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.5242333453411014
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.5883888888888889
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.4663825636525529
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoHotpotQA
type: NanoHotpotQA
metrics:
- type: MaxSim_accuracy@1
value: 0.94
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 1
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 1
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 1
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.94
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.5666666666666667
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.3559999999999999
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.18599999999999994
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.47
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.85
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.89
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.93
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.8918313878583112
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.9666666666666666
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.8388140096618357
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoMSMARCO
type: NanoMSMARCO
metrics:
- type: MaxSim_accuracy@1
value: 0.58
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.68
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.74
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.82
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.58
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.22666666666666668
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.14800000000000002
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.08199999999999999
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.58
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.68
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.74
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.82
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.6902252545188936
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.6501031746031746
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.6593558218584534
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoNFCorpus
type: NanoNFCorpus
metrics:
- type: MaxSim_accuracy@1
value: 0.5
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.6
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.64
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.7
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.5
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.4
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.3560000000000001
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.28
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.06472705697215374
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.09880268446365006
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.12166169350643057
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.14660598371037648
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.3681157447334094
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.5658333333333333
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.17231143133969085
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoNQ
type: NanoNQ
metrics:
- type: MaxSim_accuracy@1
value: 0.58
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.72
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.82
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.9
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.58
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.24
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.17199999999999996
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.09599999999999997
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.55
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.67
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.78
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.86
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.7085689105698346
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.6787142857142857
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.6543090180774391
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoQuoraRetrieval
type: NanoQuoraRetrieval
metrics:
- type: MaxSim_accuracy@1
value: 0.96
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 1
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 1
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 1
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.96
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.3933333333333333
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.24799999999999997
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.13199999999999998
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.8373333333333334
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.9486666666666668
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.9626666666666668
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.9833333333333333
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.9609623318470277
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.98
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.9420639971139971
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoSCIDOCS
type: NanoSCIDOCS
metrics:
- type: MaxSim_accuracy@1
value: 0.48
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.72
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.76
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.86
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.48
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.35999999999999993
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.284
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.192
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.10166666666666666
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.22166666666666665
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.28966666666666663
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.39166666666666666
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.38777798626622473
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.6133809523809525
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.30159944576020853
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoArguAna
type: NanoArguAna
metrics:
- type: MaxSim_accuracy@1
value: 0.16
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.5
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.62
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.82
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.16
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.16666666666666663
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.12400000000000003
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.08199999999999999
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.16
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.5
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.62
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.82
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.47567106787289914
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.3671111111111111
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.37145718958470925
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoSciFact
type: NanoSciFact
metrics:
- type: MaxSim_accuracy@1
value: 0.72
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.84
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.84
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.86
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.72
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.29333333333333333
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.18799999999999997
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.09599999999999997
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.695
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.82
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.84
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.85
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.7943497079909279
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.7799999999999998
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.7769113522745599
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoTouche2020
type: NanoTouche2020
metrics:
- type: MaxSim_accuracy@1
value: 0.8163265306122449
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.9795918367346939
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.9795918367346939
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 1
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.8163265306122449
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.7210884353741496
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.6530612244897959
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.5510204081632653
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.05492567388541453
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.14618659815433527
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.21615229246201462
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.35053940409516887
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.6255925383730097
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.8906705539358599
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.4430753846973072
name: Maxsim Map@100
- task:
type: nano-beir
name: Nano BEIR
dataset:
name: NanoBEIR mean
type: NanoBEIR_mean
metrics:
- type: MaxSim_accuracy@1
value: 0.6274097331240187
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.7645839874411303
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.8076609105180533
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.883076923076923
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.6274097331240187
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.36982731554160114
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.28438932496075353
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.19900156985871267
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.3720674033605012
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.5145402673535784
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.5716874609320216
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.6569419367767595
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.6405054009190804
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.7104758789962872
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.5602066798193307
name: Maxsim Map@100
license: apache-2.0
---
The crispy, lightweight ColBERT family from Mixedbread.
🍞 Looking for a simple end-to-end retrieval solution? Meet Mixedbread Search, our multi-modal and multi-lingual search solution.
# mxbai-edge-colbert-v0-17m This model is a lightweight, 17 million parameter ColBERT with a projection dimension of 48. It is built on top of [Ettin-17M](https://huggingface.co/jhu-clsp/ettin-encoder-17m), meaning it benefits from all of ModernBERT's architectural efficiencies. Despite this extreme efficiency, it is the best-performer "edge-sized" retriever, outperforming ColBERTv2 and many models with over 10 times more parameters. It can create multi-vector representations for documents of up to 32,000 tokens and is fully compatible with the [PyLate](https://github.com/lightonai/pylate) library. ## Usage To use this model, you first need to install PyLate: via uv ```bash # uv uv add pylate # uv + pip uv pip install pylate ``` or pip ```bash # pip pip install -U pylate ``` Once installed, the model is immediately ready to use to generate representations and index documents: ```python from pylate import indexes, models, retrieve # Step 1: Load the model model = models.ColBERT( model_name_or_path="mixedbread-ai/mxbai-edge-colbert-v0-17m", ) # Step 2: Initialize an index (here, PLAID, for larger document collections) index = indexes.PLAID( index_folder="pylate-index", index_name="index", override=True, # This overwrites the existing index if any ) # Step 3: Encode your documents documents_ids = ["1", "2", "3"] documents = ["document 1 text", "document 2 text", "document 3 text"] documents_embeddings = model.encode( documents, batch_size=32, is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries show_progress_bar=True, ) # Step 4: Add document embeddings to the index by providing embeddings and corresponding ids index.add_documents( documents_ids=documents_ids, documents_embeddings=documents_embeddings, ) ``` That's all you need to do to encode a full collection! Your documents are indexed and ready to be queried: ```python # Step 5.1: Initialize the ColBERT retriever retriever = retrieve.ColBERT(index=index) # Step 2: Encode the queries queries_embeddings = model.encode( ["query for document 3", "query for document 1"], batch_size=32, is_query=True, # # Ensure that it is set to False to indicate that these are queries show_progress_bar=True, ) # Step 3: Retrieve top-k documents scores = retriever.retrieve( queries_embeddings=queries_embeddings, k=10, # Retrieve the top 10 matches for each query ) ``` ### Reranking Thanks to its extreme parameter efficiency, this model is particularly well-suited to being used as a re-ranker following an even more lightweight first stage retrieval, such as static embeding models. Re-ranking is just as straigthforward: ```python from pylate import rank, models # Load the model model = models.ColBERT( model_name_or_path="mixedbread-ai/mxbai-edge-colbert-v0-17m", ) # Define queries and documents queries = [ "query A", "query B", ] documents = [ ["document A", "document B"], ["document 1", "document C", "document B"], ] documents_ids = [ [1, 2], [1, 3, 2], ] # Embed them queries_embeddings = model.encode( queries, is_query=True, ) documents_embeddings = model.encode( documents, is_query=False, ) # Perform reranking reranked_documents = rank.rerank( documents_ids=documents_ids, queries_embeddings=queries_embeddings, documents_embeddings=documents_embeddings, ) ``` ## Evaluation ### **Results on BEIR** | Model | AVG | MS MARCO | SciFact | Touche | FiQA | TREC-COVID | NQ | DBPedia | | :---------------------------- | :-------: | :-------: | :-------: | :-------: | :-------: | :--------: | :-------: | :-------: | | **Large Models (>100M)** | | | | | | | | | | GTE-ModernColBERT-v1 | **0.547** | 0.453 | **0.763** | **0.312** | **0.453** | **0.836** | **0.618** | **0.480** | | ColBERTv2 | 0.488 | **0.456** | 0.693 | 0.263 | 0.356 | 0.733 | 0.562 | 0.446 | | **Medium Models (<35M)** | | | | | | | | | | mxbai-edge-colbert-v0-32m | 0.521 | **0.450** | **0.740** | **0.313** | 0.390 | 0.775 | **0.600** | 0.455 | | answerai-colbert-small-v1 | **0.534** | 0.434 | **0.740** | 0.250 | **0.410** | **0.831** | 0.594 | **0.464** | | bge-small-en-v1.5 | 0.517 | 0.408 | 0.713 | 0.260 | 0.403 | 0.759 | 0.502 | 0.400 | | snowflake-s | 0.519 | 0.402 | 0.722 | 0.235 | 0.407 | 0.801 | 0.509 | 0.410 | | **Small Models (<25M)** | | | | | | | | | | **mxbai-edge-colbert-v0-17m** | **0.490** | **0.416** | **0.719** | **0.316** | 0.326 | **0.713** | **0.551** | **0.410** | | colbert-muvera-micro | 0.394 | 0.364 | 0.662 | 0.251 | 0.254 | 0.561 | 0.386 | 0.332 | | all-MiniLM-L6-v2 | 0.419 | 0.365 | 0.645 | 0.169 | **0.369** | 0.472 | 0.439 | 0.323 | ### **Results on LongEmbed** | Model | AVG | | :-------------------------------------------- | :-------: | | **Large Models (>100M)** | | | GTE-ModernColBERT-v1 (32k) | **0.898** | | GTE-ModernColBERT-v1 (4k) | 0.809 | | granite-embedding-english-r2 | 0.656 | | ColBERTv2 | 0.428 | | **Medium Models (<50M)** | | | mxbai-edge-colbert-v0-32m (32k) | **0.849** | | mxbai-edge-colbert-v0-32m (4k) | 0.783 | | granite-embedding-small-english-r2 | 0.637 | | answerai-colbert-small-v1 | 0.441 | | bge-small-en-v1.5 | 0.312 | | snowflake-arctic-embed-s | 0.356 | | **Small Models (<25M)** | | | **mxbai-edge-colbert-v0-17m (32k)** | **0.847** | | **mxbai-edge-colbert-v0-17m (4k)** | 0.776 | | all-MiniLM-L6-v2 | 0.298 | | colbert-muvera-micro | 0.405 | For more details on evaluations, please read our [Tech Report](https://mixedbread.com/papers/small_colbert_report.pdf). ## Community Please join our [Discord Community](https://discord.gg/j5dWb3Qkm9) and share your feedback and thoughts! We are here to help and also always happy to chat. ## License Apache 2.0 ## Citation If you use our model, please cite the associated tech report: ```bibtex @misc{takehi2025fantasticsmallretrieverstrain, title={Fantastic (small) Retrievers and How to Train Them: mxbai-edge-colbert-v0 Tech Report}, author={Rikiya Takehi and Benjamin Clavié and Sean Lee and Aamir Shakir}, year={2025}, eprint={2510.14880}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2510.14880}, } ``` If you specifically use its projection heads, or discuss their effect, please cite our report on using different projections for ColBERT models: ```bibtex @misc{clavie2025simpleprojectionvariantsimprove, title={Simple Projection Variants Improve ColBERT Performance}, author={Benjamin Clavié and Sean Lee and Rikiya Takehi and Aamir Shakir and Makoto P. Kato}, year={2025}, eprint={2510.12327}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2510.12327}, } ``` Finally, if you use PyLate in your work, please cite PyLate itself: ```bibtex @misc{PyLate, title={PyLate: Flexible Training and Retrieval for Late Interaction Models}, author={Chaffin, Antoine and Sourty, Raphaël}, url={https://github.com/lightonai/pylate}, year={2024} } ```