VMoBA: Mixture-of-Block Attention for Video Diffusion Models Paper • 2506.23858 • Published 26 days ago • 30
SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution Paper • 2506.19838 • Published Jun 24 • 12
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper • 2506.01943 • Published Jun 2 • 24
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Paper • 2505.22613 • Published May 28 • 7
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published May 27 • 39
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published May 22 • 22
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content Paper • 2410.08260 • Published Oct 10, 2024
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning Paper • 2504.00396 • Published Apr 1 • 4
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment Paper • 2503.23907 • Published Mar 31 • 2
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Paper • 2503.19907 • Published Mar 25 • 8
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 77
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 77
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18 • 27