Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward Paper • 2510.03222 • Published 23 days ago • 44
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published 20 days ago • 92
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 20 days ago • 446
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published about 1 month ago • 132
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 183
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10 • 56
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents Paper • 2510.09577 • Published 16 days ago • 6