ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Paper • 2506.10128 • Published Jun 11, 2025 • 22
This&That: Language-Gesture Controlled Video Generation for Robot Planning Paper • 2407.05530 • Published Jul 8, 2024 • 4
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement Paper • 2504.07934 • Published Apr 10, 2025 • 21
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors Paper • 2411.04125 • Published Nov 6, 2024 • 1
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Paper • 2004.14973 • Published Apr 30, 2020