metadata
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
license: mit
pipeline_tag: video-text-to-text
library_name: transformers
This repository contains the model described in Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence.
Project page: https://diankun-wu.github.io/Spatial-MLLM/