FLES-1 v12b — Sparse Lexical Encoder (Most Efficient)

Paper: Closed-Loop FLOPS Regulation for Learned Sparse Retrieval — Golvis Tavarez, Mindoval, Inc.

Model Description

Ultra-sparse variant of FLES-1. Only 139 non-zero terms per document at threshold=0.3 — comparable to SPLADE's 125. 2.6x sparser than v14 with only 4% lower quality.

This model proved that step-interval CLFR works: a single lambda adjustment after ~6,250 steps outperformed 12,500 per-step adjustments across 5 controller designs.

Metrics

nfcorpus (threshold=0.3)

Metric	Value
NDCG@10	0.2923
MRR	0.5001
Recall@100	0.2367
Avg NNZ	139

Comparison

Model	NDCG@10	NNZ	Index size (1K docs)
fles1-v12b	0.292	139	0.12 MB
fles1-v14	0.305	359	0.32 MB
BM25 (Pyserini)	0.325	—	—
SPLADE-Doc (no distillation)	0.323	—	—
SPLADE-cocondenser	0.340	125	~0.11 MB
Dense 768d	—	768	3.07 MB

v12b matches SPLADE's sparsity (139 vs 125 NNZ) at 86% of v14's quality. Index is 25.6x smaller than dense vectors. Note: v12b is 10% behind Pyserini BM25 (0.325).

Special Property

v12b holds the best-ever NDCG@10 at threshold=0.0 (no sparsity filtering): 0.298. The model's raw retrieval quality is excellent — the low NNZ at t=0.3 means aggressive thresholding removes some useful terms.

v12b is also the foundation checkpoint for v14. It represents the "backbone quality" that v14 built on.

When to Use v12b Over v14

Index size matters (2.6x smaller than v14)
Query latency matters (fewer posting list lookups)
You need SPLADE-level sparsity without distillation
You're building a system with quality/efficiency tradeoff

Training

Foundation: fles1-v7 (1 generation from bert-base-uncased)
Data: 200,000 MS MARCO random negatives
Epochs: 2 (12,500 steps)
Loss: InfoNCE (τ=0.05) + L2 FLOPS (λ_d=0.00003) + anti-collapse
Controller: Step-interval CLFR, adjusted every ~6,250 steps (target_nnz_d=400, gain=0.1)
  Epoch 1 → Epoch 2: λ_d adjusted from 0.000030 to 0.000034 (+13%)
Optimizer: AdamW, lr=2e-5, batch_size=32, 7 negatives
Hardware: 1× A100 80GB, ~2 hours

Note: v12b was trained with L2 FLOPS (before the L1 discovery). The L2 training produced more aggressive sparsity (NNZ=139) which is an advantage for efficiency.

The Step-Interval CLFR Proof

v12b is the proof that step-interval control works:

Approach	Adjustments	NDCG@10
No control (v12a)	0	0.276
Per-step (v11c)	12,500	0.297
Epoch-level (v12b)	1	0.292

One adjustment. +5.8% over no control. Within 1.7% of 12,500 chaotic adjustments.

Usage

from fles1_encoder import FLES1Encoder

encoder = FLES1Encoder.from_pretrained("mindoval/fles1-v12b")
sparse = encoder.encode("What is machine learning?")

License

Apache 2.0

Golvis Tavarez — Mindoval, Inc. We thank Microsoft, Inc. for supporting this research through the Microsoft for Startups program. https://mindoval.com/ai-research

Downloads last month: 21

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

mindoval
/

fles1-v12b