Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking Paper • 2503.16538 • Published Mar 18