Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models Paper • 2503.06749 • Published Mar 9 • 31
Visual-TableQA: Open-Domain Benchmark for Reasoning over Table Images Paper • 2509.07966 • Published Sep 9 • 4
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 76
PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images Paper • 2509.25185 • Published 26 days ago • 4
Seeing Culture: A Benchmark for Visual Reasoning and Grounding Paper • 2509.16517 • Published Sep 20 • 1
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception Paper • 2509.21100 • Published about 1 month ago • 1
Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles Paper • 2505.23590 • Published May 29 • 25