SaSaSa2VA: Segmentation Augmented and Selective Averaged Sa2VA

[📜 arXiv] [🧑‍💻 GitHub] [🤗 HuggingFace] [🎯 Challenge]

Quanzhu Niu^1* · Dengxian Gong^1* · Shihao Chen^1* · Tao Zhang^1* · Yikang Zhou¹ · Haobo Yuan² · Lu Qi¹ · Xiangtai Li³ · Shunping Ji^1†

¹WHU ²UC Merced ³NTU

*equal contribution †corresponding author

🎉 1st Place in ICCV 2025 LSVOS Challenge RVOS Track! 🎉

We win 1st place in ICCV 2025 LSVOS (Large-scale Video Object Segmentation) challenge RVOS (Referring Video Object Segmentation) track. The top 3 teams' methods are all based on Sa2VA. The challenge leaderborad:

Method/Team Name	J&F	Report
🏅 SaSaSa2VA (Ours)	67.45	📝 link
🥈 Ranhong	64.65	📝 link
🥉 Sa2VA-i	64.14	📝 link

Model Zoo

We provide the following models:

Model Name	Base MLLM	HF Link
SaSaSa2VA-4B	InternVL2.5-4B	🤗 link
SaSaSa2VA-14B	InternVL3.5-14B	To be released
SaSaSa2VA-26B	InternVL2.5-26B	🤗 link

Citation

If you find our work useful, please consider referring to the challenge report:

@article{sasasa2va,
  title={The 1st Solution for 7th LSVOS RVOS Track: {SaSaSa2VA}},
  author={Niu, Quanzhu and Gong, Dengxian and Chen, Shihao and Zhang, Tao and Zhou, Yikang and Yuan, Haobo and Qi, Lu and Li, Xiangtai and Ji, Shunping},
  journal={arXiv preprint arXiv:2509.16972},
  year={2025}
}

Downloads last month: 27

Safetensors

Model size

4B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train QuanzhuNiu/SaSaSa2VA-4B

Collection including QuanzhuNiu/SaSaSa2VA-4B

SaSaSa2VA Model Zoo

Collection

Models and challenge report for Segmentation Augmented and Selective Averaged Sa2VA (SaSaSa2VA). • 3 items • Updated 26 days ago • 1