OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 9 days ago • 86
Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published 12 days ago • 41
PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image Paper • 2511.13648 • Published 12 days ago • 51
Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals Paper • 2510.27684 • Published 29 days ago • 22
The Quest for Generalizable Motion Generation: Data, Model, and Evaluation Paper • 2510.26794 • Published 30 days ago • 26
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16 • 65
ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding Paper • 2508.21496 • Published Aug 29 • 54
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published Aug 18 • 34
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper • 2412.00174 • Published Nov 29, 2024 • 23
Trajectory Attention for Fine-grained Video Motion Control Paper • 2411.19324 • Published Nov 28, 2024 • 13