ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Paper • 2603.15478 • Published 4 days ago • 24
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 4 days ago • 28
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 32
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 39
Discrete Diffusion LLM & MLLM Collection An collection of research/models in discrete diffusion large language and multimodal models • 57 items • Updated Jun 17, 2025 • 4
Discrete Diffusion in Large Language and Multimodal Models: A Survey Paper • 2506.13759 • Published Jun 16, 2025 • 43
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding Paper • 2505.16990 • Published May 22, 2025 • 22
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models Paper • 2506.01667 • Published Jun 2, 2025 • 21