SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

[๐Ÿ“‚ GitHub] [๐Ÿ“ฆ Benchmark] [๐ŸŒ Homepage] [๐Ÿ“„ Paper]

Highlights

  • ๐Ÿ”ฅWe introduce Segment Concept (SeC), a concept-driven segmentation framework for video object segmentation that integrates Large Vision-Language Models (LVLMs) for robust, object-centric representations.
  • ๐Ÿ”ฅSeC dynamically balances semantic reasoning with feature matching, adaptively adjusting computational efforts based on scene complexity for optimal segmentation performance.
  • ๐Ÿ”ฅWe propose the Semantic Complex Scenarios Video Object Segmentation (SeCVOS) benchmark, designed to evaluate segmentation in challenging scenarios.

SeC Performance

Model SA-V val SA-V test LVOS v2 val MOSE val DAVIS 2017 val YTVOS 2019 val SeCVOS
SAM 2.1 78.6 79.6 84.1 74.5 90.6 88.7 58.2
SAMURAI 79.8 80.0 84.2 72.6 89.9 88.3 62.2
SAM2.1Long 81.1 81.2 85.9 75.2 91.4 88.7 62.3
SeC (Ours) 82.7 81.7 86.5 75.3 91.3 88.6 70.0

Citation

If you find this project useful in your research, please consider citing:

@article{zhang2025sec,
  title     = {SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction},
  author    = {Zhixiong Zhang and Shuangrui Ding and Xiaoyi Dong and Songxin He and Jianfan Lin and Junsong Tang and Yuhang Zang and Yuhang Cao and Dahua Lin and Jiaqi Wang},
  journal   = {arXiv preprint arXiv:2507.15852},
  year      = {2025}
}
Downloads last month
4
Safetensors
Model size
3.95B params
Tensor type
F32
ยท
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for OpenIXCLab/SeC-4B

Finetuned
(1)
this model