Vision-Language-Action Models for Autonomous Driving: Past
Generate captions for images
Segment objects in RGBD images