CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence
Abstract
CARLA-Air integrates high-fidelity driving and multirotor flight simulation within a unified Unreal Engine framework, supporting joint air-ground agent modeling with photorealistic environments and multi-modal sensing capabilities.
The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency. We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure. Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir
Community
CARLA-Air is an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process, providing a practical simulation foundation for air-ground embodied intelligence research.
carla-air's idea of a single unreal engine world that unifies carla driving and airsim flight in one process is a clean way to unblock air-ground embodied research. my main worry is how robust the shared physics tick stays under heavier workloads—when you scale agents and sensors, could rpc latency or frame-queue pressure introduce subtle spatial-temporal drift even with the unified world? an ablation showing the impact of removing either the ground or aerial module on downstream perception and control would help confirm where the joint gains come from. btw, the arxivlens breakdown (https://arxivlens.com/PaperView/Details/carla-air-fly-drones-inside-a-carla-world-a-unified-infrastructure-for-air-ground-embodied-intelligence-8595-517d7330) helped me parse the method details.
Thanks for the thoughtful feedback! The drift concern under heavier workloads is something we're actively thinking about—the unified tick does help a lot, but you're right that frame-queue pressure at scale is still a potential weak point. We're planning to run more systematic stress tests on that front.
The ablation is a great suggestion and honestly something we want to add in a follow-up.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AirSimAG: A High-Fidelity Simulation Platform for Air-Ground Collaborative Robotics (2026)
- aerial-autonomy-stack - a Faster-than-real-time, Autopilot-agnostic, ROS2 Framework to Simulate and Deploy Perception-based Drones (2026)
- HUGE-Bench: A Benchmark for High-Level UAV Vision-Language-Action Tasks (2026)
- AeroGen: Agentic Drone Autonomy through Single-Shot Structured Prompting&Drone SDK (2026)
- PiLoT: Neural Pixel-to-3D Registration for UAV-based Ego and Target Geo-localization (2026)
- Fly, Track, Land: Infrastructure-less Magnetic Localization for Heterogeneous UAV-UGV Teaming (2026)
- Seeing Where to Deploy: Metric RGB-Based Traversability Analysis for Aerial-to-Ground Hidden Space Inspection (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2603.28032 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper