AutoWebWorld: Synthesizing Infinite Verifiable Web Environments via Finite State Machines Paper • 2602.14296 • Published 10 days ago • 47
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering Paper • 2503.06492 • Published Mar 9, 2025 • 11