A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning Paper โข 2512.14442 โข Published 11 days ago โข 10
UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving Paper โข 2512.09864 โข Published 16 days ago โข 10
DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation Paper โข 2511.23127 โข Published 29 days ago โข 43
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models Paper โข 2511.13704 โข Published Nov 17 โข 42
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs Paper โข 2510.09507 โข Published Oct 10 โข 10
Visual Representation Alignment for Multimodal Large Language Models Paper โข 2509.07979 โข Published Sep 9 โข 83
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper โข 2509.00676 โข Published Aug 31 โข 84
A Survey of Reinforcement Learning for Large Reasoning Models Paper โข 2509.08827 โข Published Sep 10 โข 190
MolmoAct: Action Reasoning Models that can Reason in Space Paper โข 2508.07917 โข Published Aug 11 โข 44
Emerging Properties in Unified Multimodal Pretraining Paper โข 2505.14683 โข Published May 20 โข 133