File size: 402 Bytes
aea91ab
 
 
7c07e6a
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
---
base_model:
- Qwen/Qwen2.5-VL-3B-Instruct
license: mit
pipeline_tag: video-text-to-text
library_name: transformers
---

This repository contains the model described in [Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence](https://huggingface.co/papers/2505.23747).

Project page: https://diankun-wu.github.io/Spatial-MLLM/

Code: https://github.com/diankun-wu/Spatial-MLLM