FLES-1 v14 β Sparse Lexical Encoder (Best Quality)
Paper: Closed-Loop FLOPS Regulation for Learned Sparse Retrieval β Golvis Tavarez, Mindoval, Inc.
Model Description
FLES-1 transforms text into interpretable sparse vectors using BERT's MLM predictions. Each of the 30,522 dimensions corresponds to a real vocabulary word β readable, debuggable, and compatible with standard inverted indices (Elasticsearch, OpenSearch).
Trained with two novel techniques:
- L1 FLOPS regularization β eliminates the gradient explosion that causes training instability in all published sparse retrieval models
- Step-interval CLFR β closed-loop sparsity control that adjusts regularization every ~6,250 steps (one epoch in our setup) based on measured sparsity
Metrics
nfcorpus (threshold=0.3)
| Metric | Value |
|---|---|
| NDCG@10 | 0.3049 |
| MRR | 0.5182 |
| Recall@100 | 0.2544 |
| Avg NNZ | 359 |
Reproducibility
This recipe was run 5 times with different seeds:
| Seed | NDCG@10 |
|---|---|
| v14 (original) | 0.305 |
| v17c | 0.299 |
| v31a | 0.299 |
| v32 (seed=7777) | 0.299 |
| v26a (seed=42) | 0.272 |
Mean: 0.295. Std: 0.013. v14 is at the high end of variance. Expected reproduction: 0.295 Β± 0.013.
Baselines
| Model | NDCG@10 | NNZ | Distillation | Training Data |
|---|---|---|---|---|
| FLES-1 v14 | 0.305 | 359 | None | 200K MS MARCO |
| BM25 (Pyserini, stemmed) | 0.325 | β | β | β |
| BM25 (regex, no stemming) | 0.307 | β | β | β |
| SPLADE-Doc (no distillation) | 0.323 | β | None | Full MS MARCO |
| SPLADE original (no distillation) | 0.336 | β | None | Full MS MARCO |
| SPLADE-cocondenser (distilled) | 0.340 | 125 | Cross-encoder | Full MS MARCO |
FLES-1 v14 is 6% behind Pyserini BM25 (0.325) and 6-10% behind non-distilled SPLADE variants. The paper's contribution is the training methodology (CLFR, L1 FLOPS, lambda-steps tradeoff), not the absolute numbers.
Cross-Domain (zero-shot)
| Dataset | Domain | NDCG@10 |
|---|---|---|
| nfcorpus | Medical | 0.305 |
| scifact | Scientific claims | 0.557 |
| fiqa | Financial Q&A | 0.212 |
| arguana | Argument retrieval | 0.142 |
| scidocs | Scientific docs | 0.112 |
Production
| Metric | GPU (A100) | CPU |
|---|---|---|
| Encoding | 245 docs/sec | 87 docs/sec |
| Query latency | 10 ms avg | 33 ms avg |
| Index size (1K docs) | 0.32 MB | β |
| vs dense 768d | 9.5x smaller | β |
Training
Foundation: fles1-v12b (2 generations from bert-base-uncased)
Data: 200,000 MS MARCO random negatives
Epochs: 2 (12,500 steps)
Loss: InfoNCE (Ο=0.05) + L1 FLOPS (Ξ»_d=0.00003) + anti-collapse
Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1)
Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives per query
Hardware: 1Γ A100 80GB, ~2 hours
The CLFR Paper
Full paper coming soon.
This model is the primary result of a 75-run empirical study of training dynamics in sparse retrieval. The study discovered:
- L1 FLOPS regularization (reduces training crashes from 10-17 to 0-7 per run)
- Epoch-level closed-loop sparsity control (1 adjustment per ~6,250 steps outperforms 12,500 per-step adjustments)
- The lambda-steps tradeoff (eff_reg = Ξ» Γ steps, sweet spot 0.10-0.20)
- The binary contrastive ceiling (0.298 Β± 0.007 for InfoNCE with random negatives)
- Checkpoint archaeology (longitudinal weight analysis across 43 training runs)
Limitations
- Trained on MS MARCO (English web Q&A). Domain transfer to non-English or specialized domains requires fine-tuning.
- NNZ=359 is denser than SPLADE (125). For latency-critical deployments, consider fles1-v12b (NNZ=139).
- The 0.305 result is at the high end of variance for this recipe (mean=0.295).
- Does not use knowledge distillation β the gap to SPLADE (10.4%) is structural.
Usage
from fles1_encoder import FLES1Encoder
# Load model
encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v14")
# Encode text to sparse vector
sparse = encoder.encode("What is machine learning?")
# Returns: {'machine': 1.39, 'learning': 1.08, 'machines': 0.63, ...}
# Batch encode
vectors = encoder.encode_batch(["query 1", "query 2"], batch_size=32)
# Encode to term IDs (for inverted index)
ids, weights = encoder.encode_to_ids("What is machine learning?")
License
Apache 2.0
Golvis Tavarez β Mindoval, Inc. We thank Microsoft, Inc. for supporting this research through the Microsoft for Startups program. https://mindoval.com/ai-research
- Downloads last month
- 35