Submitted by Saeed Ranjbar Alvar 4 From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model Huawei's Vancouver VBDAI Lab 2 2