Provable Benefits of In-Tool Learning for Large Language Models Paper • 2508.20755 • Published Aug 28 • 11
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28 • 63
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench Paper • 2508.20931 • Published Aug 28 • 15
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning Paper • 2509.08755 • Published Sep 10 • 55
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11 • 33
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 20 days ago • 46