4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published 9 days ago • 42
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics Paper • 2512.13660 • Published 12 days ago • 37
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 22 days ago • 38
Geometrically-Constrained Agent for Spatial Reasoning Paper • 2511.22659 • Published 30 days ago • 40
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics Paper • 2510.07181 • Published Oct 8 • 1
LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions Paper • 2510.08211 • Published Oct 9 • 22
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23 • 29
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 77
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space Paper • 2508.19247 • Published Aug 26 • 43
RoboRefer & RefSpatial Collection RoboRefer weights, RefSpatial Dataset and RefSpatial-Bench • 9 items • Updated Oct 24 • 3
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data Paper • 2507.07095 • Published Jul 9 • 55
Use Property-Based Testing to Bridge LLM Code Generation and Validation Paper • 2506.18315 • Published Jun 23 • 11
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models Paper • 2506.19851 • Published Jun 24 • 60