Text Classification
sentence-transformers
Safetensors
English
multilingual
bert
cross-encoder
reranker
ror
affiliation-matching
text-embeddings-inference
Instructions to use cometadata/ms-marco-ror-reranker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use cometadata/ms-marco-ror-reranker with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("cometadata/ms-marco-ror-reranker") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
ms-marco-ror-reranker
A cross-encoder reranker fine-tuned for Research Organization Registry (ROR) affiliation matching.
Model Description
This model is fine-tuned from cross-encoder/ms-marco-MiniLM-L-12-v2 on ROR affiliation matching data.
It reranks candidate ROR organizations given an affiliation string query.
Training
- Base model: cross-encoder/ms-marco-MiniLM-L-12-v2
- Training examples: 45,061
- Training traces: 2,004
- Negative sampling: Hard negatives from retrieval candidates
- Epochs: 3
- Batch size: 16
- Learning rate: 2e-05
- Max sequence length: 256
Usage
from sentence_transformers import CrossEncoder
model = CrossEncoder("cometadata/ms-marco-ror-reranker")
# Score affiliation-candidate pairs
pairs = [
["University of California, Berkeley", "University of California, Berkeley"],
["University of California, Berkeley", "University of California, Los Angeles"],
]
scores = model.predict(pairs)
print(scores) # Higher score = better match
Intended Use
This model is designed for reranking ROR organization candidates in affiliation matching pipelines. It should be used after an initial retrieval step (e.g., dense retrieval with Snowflake Arctic).
Training Data
Trained on traces from cometadata/ror-pipeline-traces (affrodb_s2aff_traces config).
Timestamp
2026-01-07T21:35:26.376404+00:00
- Downloads last month
- 1
Model tree for cometadata/ms-marco-ror-reranker
Base model
microsoft/MiniLM-L12-H384-uncased Quantized
cross-encoder/ms-marco-MiniLM-L12-v2