MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments Paper • 2604.13418 • Published 6 days ago • 6
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems Paper • 2504.09763 • Published Apr 14, 2025 • 12
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1, 2024 • 89