Hammer: Robust Function-Calling for On-Device Language Models via Function Masking Paper • 2410.04587 • Published Oct 6, 2024 • 2
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs Paper • 2402.15491 • Published Feb 23, 2024 • 15
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets Paper • 2406.18518 • Published Jun 26, 2024 • 25
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Paper • 2504.03601 • Published Apr 4, 2025 • 18
TOUCAN: Synthesizing 1.5M Tool-Agentic Data from Real-World MCP Environments Paper • 2510.01179 • Published Oct 1, 2025 • 28
Gorilla: Large Language Model Connected with Massive APIs Paper • 2305.15334 • Published May 24, 2023 • 6
Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks Paper • 2407.00121 • Published Jun 27, 2024
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Paper • 2307.16789 • Published Jul 31, 2023 • 102
MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published Aug 28, 2025 • 63
The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29, 2025 • 46
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 81
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 180
Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution Paper • 2509.21072 • Published Sep 25, 2025 • 16