Pretraining Frame Preservation in Autoregressive Video Memory Compression Paper β’ 2512.23851 β’ Published 12 days ago β’ 22
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning Paper β’ 2512.15635 β’ Published 24 days ago β’ 19
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper β’ 2512.13604 β’ Published 26 days ago β’ 73
view article Article Qwen-Image-i2L: Training Strategies for Image-to-LoRA Generation 26 days ago β’ 46
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper β’ 2512.02014 β’ Published Dec 1, 2025 β’ 72
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper β’ 2511.14993 β’ Published Nov 19, 2025 β’ 227
HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives Paper β’ 2510.20822 β’ Published Oct 23, 2025 β’ 40
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper β’ 2510.15742 β’ Published Oct 17, 2025 β’ 50
nablaNABLA: Neighborhood Adaptive Block-Level Attention Paper β’ 2507.13546 β’ Published Jul 17, 2025 β’ 124
Alchemist Collection π Dataset and π checkpoints for paper π "Alchemist: Turning Public Text-to-Image Data into Generative Gold" β’ 8 items β’ Updated Oct 16, 2025 β’ 17
Wan: Open and Advanced Large-Scale Video Generative Models Paper β’ 2503.20314 β’ Published Mar 26, 2025 β’ 56
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper β’ 2408.03314 β’ Published Aug 6, 2024 β’ 63
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis Paper β’ 2412.01819 β’ Published Dec 2, 2024 β’ 34
CogVLM2: Visual Language Models for Image and Video Understanding Paper β’ 2408.16500 β’ Published Aug 29, 2024 β’ 57
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper β’ 2408.16532 β’ Published Aug 29, 2024 β’ 50