File size: 402 Bytes

aea91ab
 
 
7c07e6a

---
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
license: mit
pipeline_tag: video-text-to-text
library_name: transformers
---

This repository contains the model described in [Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence](https://huggingface.co/papers/2505.23747).

Project page: https://diankun-wu.github.io/Spatial-MLLM/

Code: https://github.com/diankun-wu/Spatial-MLLM