UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published 9 days ago • 37
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published 13 days ago • 218
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30 • 113
DINOv3 Collection DINOv3: foundation models producing excellent dense features, outperforming SotA w/o fine-tuning - https://arxiv.org/abs/2508.10104 • 13 items • Updated Aug 21 • 394
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations Paper • 2509.09676 • Published Sep 11 • 32
A Survey of Reinforcement Learning for Large Reasoning Models Paper • 2509.08827 • Published Sep 10 • 188
CineScale: Free Lunch in High-Resolution Cinematic Visual Generation Paper • 2508.15774 • Published Aug 21 • 20
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 207
Lumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models Paper • 2508.12945 • Published Aug 18 • 14
Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model Paper • 2508.13009 • Published Aug 18 • 25
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6 • 58
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14 • 143
Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control Paper • 2508.08134 • Published Aug 11 • 10