Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation Paper • 2510.01284 • Published 20 days ago • 30
LongLive: Real-time Interactive Long Video Generation Paper • 2509.22622 • Published 24 days ago • 176
ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published 24 days ago • 63
Seedream 4.0: Toward Next-generation Multimodal Image Generation Paper • 2509.20427 • Published 26 days ago • 73
SD3.5-Flash: Distribution-Guided Distillation of Generative Flows Paper • 2509.21318 • Published 25 days ago • 9
DiffusionNFT: Online Diffusion Reinforcement with Forward Process Paper • 2509.16117 • Published about 1 month ago • 20
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6 • 57
Vision-Language-Vision Auto-Encoder: Scalable Knowledge Distillation from Diffusion Models Paper • 2507.07104 • Published Jul 9 • 45
DreamPoster: A Unified Framework for Image-Conditioned Generative Poster Design Paper • 2507.04218 • Published Jul 6 • 12
From KMMLU-Redux to KMMLU-Pro: A Professional Korean Benchmark Suite for LLM Evaluation Paper • 2507.08924 • Published Jul 11 • 17
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14 • 49
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models Paper • 2507.08128 • Published Jul 10 • 10