Papers
arxiv:2605.09806

LEAD: Length-Efficient Adaptive and Dynamic Reasoning for Large Language Models

Published on May 10
ยท Submitted by
Songtao Wei
on May 14
Authors:
,
,
,
,
,
,
,
,

Abstract

LEAD is a method that dynamically adapts reasoning efficiency during training by using online calibration of correctness-efficiency trade-offs and adaptive problem-specific length targets to improve mathematical reasoning accuracy and efficiency.

AI-generated summary

Large reasoning models, such as OpenAI o1 and DeepSeek-R1, tend to become increasingly verbose as their reasoning capabilities improve. These inflated Chain-of-Thought (CoT) trajectories often exceed what the underlying problems require, wasting compute, latency, and context budgets. While introducing length-based efficiency rewards during reinforcement learning offers a natural remedy, existing methods struggle with two fundamental challenges: the optimal balance between correctness and efficiency is non-stationary throughout training, and intrinsic reasoning budgets vary drastically across problems. Relying on static reward weights and global length constraints inevitably forces a compromise between degraded accuracy and unrealized compression. To overcome these limitations, we propose LEAD (Length-Efficient Adaptive and Dynamic reasoning), a method that replaces static heuristics with online, self-adaptive mechanisms. LEAD dynamically calibrates the correctness-efficiency trade-off at each step using a Potential-Scaled Instability, directing optimization capacity to the most informative learning signal. Furthermore, it estimates an adaptive per-problem target length online based on the model's own correct rollouts, applying a symmetric efficiency reward that penalizes both overthinking and over-compression. Evaluated on five mathematical reasoning benchmarks, LEAD achieves the highest accuracy and Accuracy-Efficiency Score among RL-trained efficient-reasoning methods while producing substantially shorter outputs than the base model.

Community

Paper author Paper submitter

๐Ÿš€ Excited to share LEAD: Length-Efficient Adaptive and Dynamic Reasoning for LLMs!

Reasoning LLMs are great at solving hard problems, but they tend to "think longer to think better" โ€” even when the problem doesn't need it. Length-penalty RL fixes this in principle, but in practice every existing recipe makes two static assumptions that the underlying signals don't honor:
1. Fixed reward weights โ€” but the correctness-vs-efficiency balance is non-stationary across training.
2. A single global length budget โ€” but reasoning budgets vary by orders of magnitude across prompts.

LEAD replaces both with online, self-calibrating mechanisms:
๐ŸŽ›๏ธ A Potential-Scaled Instability (PSI) controller adapts the weights every step from each reward's within-group variance and headroom-to-saturation โ€” implementing an explore-then-anchor curriculum automatically.

๐Ÿ“ A per-problem online target estimated from the model's own correct rollouts, with a symmetric efficiency reward that penalizes over-compression as well as overthinking.

Headline results (DeepSeek-R1-Distill-Qwen-1.5B, 4K budget, 5 math benchmarks): LEAD reaches 53.36 acc / 3714 tokens / +0.68 AES, the only method that improves accuracy over base while reducing length.

๐Ÿ“„ Paper: https://arxiv.org/abs/2605.09806
๐Ÿ’ป Code: https://github.com/CrazyMint/LEAD
๐Ÿค— Model: https://huggingface.co/Kotom1/math_lead_4k_deepseek-r1-1.5b

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.09806
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09806 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09806 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09806 in a Space README.md to link it from this page.

Collections including this paper 1