ScaleEnv: Scaling Environment Synthesis from Scratch for Generalist Interactive Tool-Use Agent Training
Abstract
ScaleEnv framework generates interactive environments from scratch to improve agent generalization through diverse domain scaling and verified task completion.
Training generalist agents capable of adapting to diverse scenarios requires interactive environments for self-exploration. However, interactive environments remain critically scarce, and existing synthesis methods suffer from significant limitations regarding environmental diversity and scalability. To address these challenges, we introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. Specifically, ScaleEnv ensures environment reliability through procedural testing, and guarantees task completeness and solvability via tool dependency graph expansion and executable action verification. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as τ^2-Bench and VitaBench, highlighting strong generalization capabilities. Furthermore, we investigate the relationship between increasing number of domains and model generalization performance, providing empirical evidence that scaling environmental diversity is critical for robust agent learning.
Community
We introduce ScaleEnv, a framework that constructs fully interactive environments and verifiable tasks entirely from scratch. By enabling agents to learn through exploration within ScaleEnv, we demonstrate significant performance improvements on unseen, multi-turn tool-use benchmarks such as $\tau^2$-Bench and VitaBench, highlighting strong generalization capabilities.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience (2026)
- TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents (2026)
- Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing (2025)
- From Failure to Mastery: Generating Hard Samples for Tool-use Agents (2026)
- From Self-Evolving Synthetic Data to Verifiable-Reward RL: Post-Training Multi-turn Interactive Tool-Using Agents (2026)
- EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis (2026)
- ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper