FLES-1 v14 β€” Sparse Lexical Encoder (Best Quality)

Paper: Closed-Loop FLOPS Regulation for Learned Sparse Retrieval β€” Golvis Tavarez, Mindoval, Inc.

Model Description

FLES-1 transforms text into interpretable sparse vectors using BERT's MLM predictions. Each of the 30,522 dimensions corresponds to a real vocabulary word β€” readable, debuggable, and compatible with standard inverted indices (Elasticsearch, OpenSearch).

Trained with two novel techniques:

  • L1 FLOPS regularization β€” eliminates the gradient explosion that causes training instability in all published sparse retrieval models
  • Step-interval CLFR β€” closed-loop sparsity control that adjusts regularization every ~6,250 steps (one epoch in our setup) based on measured sparsity

Metrics

nfcorpus (threshold=0.3)

Metric Value
NDCG@10 0.3049
MRR 0.5182
Recall@100 0.2544
Avg NNZ 359

Reproducibility

This recipe was run 5 times with different seeds:

Seed NDCG@10
v14 (original) 0.305
v17c 0.299
v31a 0.299
v32 (seed=7777) 0.299
v26a (seed=42) 0.272

Mean: 0.295. Std: 0.013. v14 is at the high end of variance. Expected reproduction: 0.295 Β± 0.013.

Baselines

Model NDCG@10 NNZ Distillation Training Data
FLES-1 v14 0.305 359 None 200K MS MARCO
BM25 (Pyserini, stemmed) 0.325 β€” β€” β€”
BM25 (regex, no stemming) 0.307 β€” β€” β€”
SPLADE-Doc (no distillation) 0.323 β€” None Full MS MARCO
SPLADE original (no distillation) 0.336 β€” None Full MS MARCO
SPLADE-cocondenser (distilled) 0.340 125 Cross-encoder Full MS MARCO

FLES-1 v14 is 6% behind Pyserini BM25 (0.325) and 6-10% behind non-distilled SPLADE variants. The paper's contribution is the training methodology (CLFR, L1 FLOPS, lambda-steps tradeoff), not the absolute numbers.

Cross-Domain (zero-shot)

Dataset Domain NDCG@10
nfcorpus Medical 0.305
scifact Scientific claims 0.557
fiqa Financial Q&A 0.212
arguana Argument retrieval 0.142
scidocs Scientific docs 0.112

Production

Metric GPU (A100) CPU
Encoding 245 docs/sec 87 docs/sec
Query latency 10 ms avg 33 ms avg
Index size (1K docs) 0.32 MB β€”
vs dense 768d 9.5x smaller β€”

Training

Foundation: fles1-v12b (2 generations from bert-base-uncased)
Data: 200,000 MS MARCO random negatives
Epochs: 2 (12,500 steps)
Loss: InfoNCE (Ο„=0.05) + L1 FLOPS (Ξ»_d=0.00003) + anti-collapse
Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1)
Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives per query
Hardware: 1Γ— A100 80GB, ~2 hours

The CLFR Paper

Full paper coming soon.

This model is the primary result of a 75-run empirical study of training dynamics in sparse retrieval. The study discovered:

  • L1 FLOPS regularization (reduces training crashes from 10-17 to 0-7 per run)
  • Epoch-level closed-loop sparsity control (1 adjustment per ~6,250 steps outperforms 12,500 per-step adjustments)
  • The lambda-steps tradeoff (eff_reg = Ξ» Γ— steps, sweet spot 0.10-0.20)
  • The binary contrastive ceiling (0.298 Β± 0.007 for InfoNCE with random negatives)
  • Checkpoint archaeology (longitudinal weight analysis across 43 training runs)

Limitations

  • Trained on MS MARCO (English web Q&A). Domain transfer to non-English or specialized domains requires fine-tuning.
  • NNZ=359 is denser than SPLADE (125). For latency-critical deployments, consider fles1-v12b (NNZ=139).
  • The 0.305 result is at the high end of variance for this recipe (mean=0.295).
  • Does not use knowledge distillation β€” the gap to SPLADE (10.4%) is structural.

Usage

from fles1_encoder import FLES1Encoder

# Load model
encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v14")

# Encode text to sparse vector
sparse = encoder.encode("What is machine learning?")
# Returns: {'machine': 1.39, 'learning': 1.08, 'machines': 0.63, ...}

# Batch encode
vectors = encoder.encode_batch(["query 1", "query 2"], batch_size=32)

# Encode to term IDs (for inverted index)
ids, weights = encoder.encode_to_ids("What is machine learning?")

License

Apache 2.0

Golvis Tavarez β€” Mindoval, Inc. We thank Microsoft, Inc. for supporting this research through the Microsoft for Startups program. https://mindoval.com/ai-research

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train mindoval/fles1-v14