Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning Paper ⢠2506.04559 ⢠Published Jun 5, 2025 ⢠2
Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models Paper ⢠2410.23114 ⢠Published Oct 30, 2024
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous Driving with Adaptive Control Paper ⢠2411.13807 ⢠Published Nov 21, 2024 ⢠11
Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases Paper ⢠2404.10595 ⢠Published Apr 16, 2024 ⢠1
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts Paper ⢠2402.05382 ⢠Published Feb 8, 2024
MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes Paper ⢠2405.14475 ⢠Published May 23, 2024 ⢠1
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper ⢠2409.18042 ⢠Published Sep 26, 2024 ⢠39
Mixed Autoencoder for Self-supervised Visual Representation Learning Paper ⢠2303.17152 ⢠Published Mar 30, 2023
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning Paper ⢠2312.12379 ⢠Published Dec 19, 2023 ⢠2
TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models Paper ⢠2312.00651 ⢠Published Dec 1, 2023 ⢠1
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis Paper ⢠2310.10477 ⢠Published Oct 16, 2023
MagicDrive: Street View Generation with Diverse 3D Geometry Control Paper ⢠2310.02601 ⢠Published Oct 4, 2023 ⢠1
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation Paper ⢠2306.04607 ⢠Published Jun 7, 2023