LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning
Abstract
Length-aware Sampling for Policy Optimization (LSPO) is a meta-RLVR algorithm that dynamically selects training data based on response length, improving learning effectiveness in large language models.
Since the release of Deepseek-R1, reinforcement learning with verifiable rewards (RLVR) has become a central approach for training large language models (LLMs) on reasoning tasks. Recent work has largely focused on modifying loss functions to make RLVR more efficient and effective. In this paper, motivated by studies of overthinking in LLMs, we propose Length-aware Sampling for Policy Optimization (LSPO), a novel meta-RLVR algorithm that dynamically selects training data at each step based on the average response length. We evaluate LSPO across multiple base models and datasets, demonstrating that it consistently improves learning effectiveness. In addition, we conduct a detailed ablation study to examine alternative ways of incorporating length signals into dynamic sampling, offering further insights and highlighting promising directions for future research.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse (2025)
- DCPO: Dynamic Clipping Policy Optimization (2025)
- CLPO: Curriculum Learning meets Policy Optimization for LLM Reasoning (2025)
- G2RPO-A: Guided Group Relative Policy Optimization with Adaptive Guidance (2025)
- VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models (2025)
- Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models (2025)
- Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper