Demystifying Reinforcement Learning in Agentic Reasoning
Abstract
Agentic reinforcement learning enhances LLMs' reasoning ability through real datasets, exploration techniques, and a deliberative strategy, achieving strong performance with smaller models.
Recently, the emergence of agentic RL has showcased that RL could also effectively improve the agentic reasoning ability of LLMs, yet the key design principles and optimal practices remain unclear. In this work, we conduct a comprehensive and systematic investigation to demystify reinforcement learning in agentic reasoning from three key perspectives: data, algorithm, and reasoning mode. We highlight our key insights: (i) Replacing stitched synthetic trajectories with real end-to-end tool-use trajectories yields a far stronger SFT initialization; high-diversity, model-aware datasets sustain exploration and markedly improve RL performance. (ii) Exploration-friendly techniques are crucial for agentic RL, such as clip higher, overlong reward shaping, and maintaining adequate policy entropy could improve the training efficiency. (iii) A deliberative strategy with fewer tool calls outperforms frequent tool calls or verbose self-reasoning, improving tool efficiency and final accuracy. Together, these simple practices consistently enhance agentic reasoning and training efficiency, achieving strong results on challenging benchmarks with smaller models, and establishing a practical baseline for future agentic RL research. Beyond these empirical insights, we further contribute a high-quality, real end-to-end agentic SFT dataset along with a high-quality RL dataset, and demonstrate the effectiveness of our insights in boosting the agentic reasoning ability of LLMs across four challenging benchmarks, including AIME2024/AIME2025, GPQA-Diamond, and LiveCodeBench-v6. With our recipes, 4B-sized models could also achieve superior agentic reasoning performance compared to 32B-sized models. Code and models: https://github.com/Gen-Verse/Open-AgentRL
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- rStar2-Agent: Agentic Reasoning Technical Report (2025)
- Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning (2025)
- VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use (2025)
- In-the-Flow Agentic System Optimization for Effective Planning and Tool Use (2025)
- Plan Then Action:High-Level Planning Guidance Reinforcement Learning for LLM Reasoning (2025)
- Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning (2025)
- Tool-Augmented Policy Optimization: Synergizing Reasoning and Adaptive Tool Use with Reinforcement Learning (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 3
Datasets citing this paper 2
Spaces citing this paper 0
No Space linking this paper