Instruction-Following Evaluation in Function Calling for Large Language Models Paper • 2509.18420 • Published Sep 22, 2025 • 3
DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints Paper • 2601.18137 • Published Jan 26 • 35
One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning Paper • 2510.26167 • Published Oct 30, 2025 • 3