BrowserArena: Evaluating LLM Agents on Real-World Web Navigation Tasks Paper • 2510.02418 • Published Oct 2 • 2
Adaptive Evaluations Collection Datasets for our paper, Adaptively profiling models with task elicitation (EMNLP 2025). • 1 item • Updated Sep 20