MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues Paper • 2510.17722 • Published Oct 20, 2025 • 20
IF-VidCap: Can Video Caption Models Follow Instructions? Paper • 2510.18726 • Published Oct 21, 2025 • 26
DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation Paper • 2604.14683 • Published 6 days ago • 32