FLES-1 v12b β Sparse Lexical Encoder (Most Efficient)
Paper: Closed-Loop FLOPS Regulation for Learned Sparse Retrieval β Golvis Tavarez, Mindoval, Inc.
Model Description
Ultra-sparse variant of FLES-1. Only 139 non-zero terms per document at threshold=0.3 β comparable to SPLADE's 125. 2.6x sparser than v14 with only 4% lower quality.
This model proved that step-interval CLFR works: a single lambda adjustment after ~6,250 steps outperformed 12,500 per-step adjustments across 5 controller designs.
Metrics
nfcorpus (threshold=0.3)
| Metric | Value |
|---|---|
| NDCG@10 | 0.2923 |
| MRR | 0.5001 |
| Recall@100 | 0.2367 |
| Avg NNZ | 139 |
Comparison
| Model | NDCG@10 | NNZ | Index size (1K docs) |
|---|---|---|---|
| fles1-v12b | 0.292 | 139 | 0.12 MB |
| fles1-v14 | 0.305 | 359 | 0.32 MB |
| BM25 (Pyserini) | 0.325 | β | β |
| SPLADE-Doc (no distillation) | 0.323 | β | β |
| SPLADE-cocondenser | 0.340 | 125 | ~0.11 MB |
| Dense 768d | β | 768 | 3.07 MB |
v12b matches SPLADE's sparsity (139 vs 125 NNZ) at 86% of v14's quality. Index is 25.6x smaller than dense vectors. Note: v12b is 10% behind Pyserini BM25 (0.325).
Special Property
v12b holds the best-ever NDCG@10 at threshold=0.0 (no sparsity filtering): 0.298. The model's raw retrieval quality is excellent β the low NNZ at t=0.3 means aggressive thresholding removes some useful terms.
v12b is also the foundation checkpoint for v14. It represents the "backbone quality" that v14 built on.
When to Use v12b Over v14
- Index size matters (2.6x smaller than v14)
- Query latency matters (fewer posting list lookups)
- You need SPLADE-level sparsity without distillation
- You're building a system with quality/efficiency tradeoff
Training
Foundation: fles1-v7 (1 generation from bert-base-uncased)
Data: 200,000 MS MARCO random negatives
Epochs: 2 (12,500 steps)
Loss: InfoNCE (Ο=0.05) + L2 FLOPS (Ξ»_d=0.00003) + anti-collapse
Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1)
Epoch 1 β Epoch 2: Ξ»_d adjusted from 0.000030 to 0.000034 (+13%)
Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives
Hardware: 1Γ A100 80GB, ~2 hours
Note: v12b was trained with L2 FLOPS (before the L1 discovery). The L2 training produced more aggressive sparsity (NNZ=139) which is an advantage for efficiency.
The Step-Interval CLFR Proof
v12b is the proof that step-interval control works:
| Approach | Adjustments | NDCG@10 |
|---|---|---|
| No control (v12a) | 0 | 0.276 |
| Per-step (v11c) | 12,500 | 0.297 |
| Epoch-level (v12b) | 1 | 0.292 |
One adjustment. +5.8% over no control. Within 1.7% of 12,500 chaotic adjustments.
Usage
from fles1_encoder import FLES1Encoder
encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v12b")
sparse = encoder.encode("What is machine learning?")
License
Apache 2.0
Golvis Tavarez β Mindoval, Inc. We thank Microsoft, Inc. for supporting this research through the Microsoft for Startups program. https://mindoval.com/ai-research
- Downloads last month
- 21