FLES-1 v12b β€” Sparse Lexical Encoder (Most Efficient)

Paper: Closed-Loop FLOPS Regulation for Learned Sparse Retrieval β€” Golvis Tavarez, Mindoval, Inc.

Model Description

Ultra-sparse variant of FLES-1. Only 139 non-zero terms per document at threshold=0.3 β€” comparable to SPLADE's 125. 2.6x sparser than v14 with only 4% lower quality.

This model proved that step-interval CLFR works: a single lambda adjustment after ~6,250 steps outperformed 12,500 per-step adjustments across 5 controller designs.

Metrics

nfcorpus (threshold=0.3)

Metric Value
NDCG@10 0.2923
MRR 0.5001
Recall@100 0.2367
Avg NNZ 139

Comparison

Model NDCG@10 NNZ Index size (1K docs)
fles1-v12b 0.292 139 0.12 MB
fles1-v14 0.305 359 0.32 MB
BM25 (Pyserini) 0.325 β€” β€”
SPLADE-Doc (no distillation) 0.323 β€” β€”
SPLADE-cocondenser 0.340 125 ~0.11 MB
Dense 768d β€” 768 3.07 MB

v12b matches SPLADE's sparsity (139 vs 125 NNZ) at 86% of v14's quality. Index is 25.6x smaller than dense vectors. Note: v12b is 10% behind Pyserini BM25 (0.325).

Special Property

v12b holds the best-ever NDCG@10 at threshold=0.0 (no sparsity filtering): 0.298. The model's raw retrieval quality is excellent β€” the low NNZ at t=0.3 means aggressive thresholding removes some useful terms.

v12b is also the foundation checkpoint for v14. It represents the "backbone quality" that v14 built on.

When to Use v12b Over v14

  • Index size matters (2.6x smaller than v14)
  • Query latency matters (fewer posting list lookups)
  • You need SPLADE-level sparsity without distillation
  • You're building a system with quality/efficiency tradeoff

Training

Foundation: fles1-v7 (1 generation from bert-base-uncased)
Data: 200,000 MS MARCO random negatives
Epochs: 2 (12,500 steps)
Loss: InfoNCE (Ο„=0.05) + L2 FLOPS (Ξ»_d=0.00003) + anti-collapse
Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1)
  Epoch 1 β†’ Epoch 2: Ξ»_d adjusted from 0.000030 to 0.000034 (+13%)
Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives
Hardware: 1Γ— A100 80GB, ~2 hours

Note: v12b was trained with L2 FLOPS (before the L1 discovery). The L2 training produced more aggressive sparsity (NNZ=139) which is an advantage for efficiency.

The Step-Interval CLFR Proof

v12b is the proof that step-interval control works:

Approach Adjustments NDCG@10
No control (v12a) 0 0.276
Per-step (v11c) 12,500 0.297
Epoch-level (v12b) 1 0.292

One adjustment. +5.8% over no control. Within 1.7% of 12,500 chaotic adjustments.

Usage

from fles1_encoder import FLES1Encoder

encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v12b")
sparse = encoder.encode("What is machine learning?")

License

Apache 2.0

Golvis Tavarez β€” Mindoval, Inc. We thank Microsoft, Inc. for supporting this research through the Microsoft for Startups program. https://mindoval.com/ai-research

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train mindoval/fles1-v12b