VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22, 2025 • 90
What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness Paper • 2502.14914 • Published Feb 19, 2025
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25, 2025 • 104
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published Dec 18, 2025 • 20
Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents Paper • 2410.13185 • Published Oct 17, 2024 • 5
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition Paper • 2407.05562 • Published Jul 8, 2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM Paper • 2501.00599 • Published Dec 31, 2024 • 46
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published Dec 18, 2025 • 11
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models Paper • 2512.16561 • Published Dec 18, 2025 • 20
RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing Paper • 2512.16864 • Published Dec 18, 2025 • 11
Cascade-DETR: Delving into High-Quality Universal Object Detection Paper • 2307.11035 • Published Jul 20, 2023
Gaussian Grouping: Segment and Edit Anything in 3D Scenes Paper • 2312.00732 • Published Dec 1, 2023 • 3
DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos Paper • 2405.02280 • Published May 3, 2024
SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking Paper • 2409.11235 • Published Sep 17, 2024
RobotArena $\infty$: Scalable Robot Benchmarking via Real-to-Sim Translation Paper • 2510.23571 • Published Oct 27, 2025 • 9
MotionEdit: Benchmarking and Learning Motion-Centric Image Editing Paper • 2512.10284 • Published Dec 11, 2025 • 26