Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
Abstract
A novel approach combining driving models and generative world models evaluates and enhances the realism and generalization of synthetic video data for autonomous vehicle testing and planning.
Recent advances in generative models have sparked exciting new possibilities in the field of autonomous vehicles. Specifically, video generation models are now being explored as controllable virtual testing environments. Simultaneously, end-to-end (E2E) driving models have emerged as a streamlined alternative to conventional modular autonomous driving systems, gaining popularity for their simplicity and scalability. However, the application of these techniques to simulation and planning raises important questions. First, while video generation models can generate increasingly realistic videos, can these videos faithfully adhere to the specified conditions and be realistic enough for E2E autonomous planner evaluation? Second, given that data is crucial for understanding and controlling E2E planners, how can we gain deeper insights into their biases and improve their ability to generalize to out-of-distribution scenarios? In this work, we bridge the gap between the driving models and generative world models (Drive&Gen) to address these questions. We propose novel statistical measures leveraging E2E drivers to evaluate the realism of generated videos. By exploiting the controllability of the video generation model, we conduct targeted experiments to investigate distribution gaps affecting E2E planner performance. Finally, we show that synthetic data produced by the video generation model offers a cost-effective alternative to real-world data collection. This synthetic data effectively improves E2E model generalization beyond existing Operational Design Domains, facilitating the expansion of autonomous vehicle services into new operational contexts.
Community
IROS 2025
Drive&Gen: Co-Evaluating End-to-End Driving and Video Generation Models
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving (2025)
- RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation (2025)
- 4D Driving Scene Generation With Stereo Forcing (2025)
- TeraSim-World: Worldwide Safety-Critical Data Synthesis for End-to-End Autonomous Driving (2025)
- EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer (2025)
- ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving (2025)
- OpenViGA: Video Generation for Automotive Driving Scenes by Streamlining and Fine-Tuning Open Source Models with Public Data (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper