YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OPSD Experiment Results

Reproduction of OPSD (On-Policy Self-Distillation) on Qwen3-1.7B, 4B, and 8B.

Results (Avg@12)

Qwen3-1.7B

Method AIME24 AIME25 HMMT25
Base 47.2% 35.3% 21.9%
OPSD (best) 49.2% 37.5% 24.4%
SFT (best) 37.5% 30.8% 19.2%
GRPO (best) 47.8% 35.0% 22.8%

Qwen3-4B

Method AIME24 AIME25 HMMT25
Base 71.1% 60.0% 38.6%
OPSD (best) 62.2% 57.2% 34.2%
SFT (best) 62.5% 58.1% 33.3%
GRPO (best) 68.9% 65.0% 41.9%

Qwen3-8B

Method AIME24 AIME25 HMMT25
Base 72.8% 61.7% 38.6%
OPSD (best) 69.4% 63.3% 38.6%
SFT (best) 69.2% 60.3% 36.1%
GRPO (best) 72.2% 65.8% 40.8%

Setup

  • All methods: lr=5e-6, BS=32, LoRA r=64 alpha=128, 200 steps
  • Eval: val_n=12, temperature=1.0, thinking mode enabled
  • Data: siyanzhao/Openthoughts_math_30k_opsd

Reference

Self-Distilled Reasoner: On-Policy Self-Distillation for LLMs

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for JLiangHe/OPSD_exp