FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published 1 day ago • 55
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE Paper • 2510.13344 • Published 1 day ago • 56
UniFusion: Vision-Language Model as Unified Encoder in Image Generation Paper • 2510.12789 • Published 2 days ago • 14
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published 2 days ago • 31
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published 3 days ago • 134
BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions Paper • 2510.10666 • Published 4 days ago • 27
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 3 days ago • 132
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published 3 days ago • 145
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 7 days ago • 109
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published 10 days ago • 125
PickStyle: Video-to-Video Style Transfer with Context-Style Adapters Paper • 2510.07546 • Published 8 days ago • 20
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction Paper • 2510.03117 • Published 13 days ago • 10
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published 7 days ago • 60