VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published 22 days ago • 70
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published Jun 5 • 13
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper • 2505.00703 • Published May 1 • 45
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13 • 28
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 117