File size: 402 Bytes
aea91ab 7c07e6a |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
---
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
license: mit
pipeline_tag: video-text-to-text
library_name: transformers
---
This repository contains the model described in [Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence](https://huggingface.co/papers/2505.23747).
Project page: https://diankun-wu.github.io/Spatial-MLLM/
Code: https://github.com/diankun-wu/Spatial-MLLM |