SaSaSa2VA Model Zoo
					Collection
				
Models and challenge report for Segmentation Augmented and Selective Averaged Sa2VA (SaSaSa2VA). 
					β’ 
				3 items
				β’ 
				Updated
					
				β’
					
					1
[π arXiv] [π§βπ» GitHub] [π€ HuggingFace] [π― Challenge]
Quanzhu Niu1* Β· Dengxian Gong1* Β· Shihao Chen1* Β· Tao Zhang1* Β· Yikang Zhou1 Β· Haobo Yuan2 Β· Lu Qi1 Β· Xiangtai Li3 Β· Shunping Ji1β
1WHUββββ2UC Mercedββββ3NTU
*equal contributionββ corresponding author
We win 1st place in ICCV 2025 LSVOS (Large-scale Video Object Segmentation) challenge RVOS (Referring Video Object Segmentation) track. The top 3 teams' methods are all based on Sa2VA. The challenge leaderborad:
| Method/Team Name | J&F | Report | 
|---|---|---|
| π SaSaSa2VA (Ours) | 67.45 | π link | 
| π₯ Ranhong | 64.65 | π link | 
| π₯ Sa2VA-i | 64.14 | π link | 
We provide the following models:
| Model Name | Base MLLM | HF Link | 
|---|---|---|
| SaSaSa2VA-4B | InternVL2.5-4B | π€ link | 
| SaSaSa2VA-14B | InternVL3.5-14B | To be released | 
| SaSaSa2VA-26B | InternVL2.5-26B | π€ link | 
If you find our work useful, please consider referring to the challenge report:
@article{sasasa2va,
  title={The 1st Solution for 7th LSVOS RVOS Track: {SaSaSa2VA}},
  author={Niu, Quanzhu and Gong, Dengxian and Chen, Shihao and Zhang, Tao and Zhou, Yikang and Yuan, Haobo and Qi, Lu and Li, Xiangtai and Ji, Shunping},
  journal={arXiv preprint arXiv:2509.16972},
  year={2025}
}