OPSD Experiment Results

Reproduction of OPSD (On-Policy Self-Distillation) on Qwen3-1.7B, 4B, and 8B.

Results (Avg@12)

Method	AIME24	AIME25	HMMT25
Base	47.2%	35.3%	21.9%
OPSD (best)	49.2%	37.5%	24.4%
SFT (best)	37.5%	30.8%	19.2%
GRPO (best)	47.8%	35.0%	22.8%

Method	AIME24	AIME25	HMMT25
Base	71.1%	60.0%	38.6%
OPSD (best)	62.2%	57.2%	34.2%
SFT (best)	62.5%	58.1%	33.3%
GRPO (best)	68.9%	65.0%	41.9%

Method	AIME24	AIME25	HMMT25
Base	72.8%	61.7%	38.6%
OPSD (best)	69.4%	63.3%	38.6%
SFT (best)	69.2%	60.3%	36.1%
GRPO (best)	72.2%	65.8%	40.8%

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support