Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 8 days ago • 155
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published 8 days ago • 139
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published 6 days ago • 66
ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints Paper • 2510.14847 • Published 5 days ago • 52
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities Paper • 2510.08759 • Published 12 days ago • 44
PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning Paper • 2510.13809 • Published 6 days ago • 35