Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published Sep 16 • 69
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 22 days ago • 46
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models Paper • 2509.26628 • Published 22 days ago • 12