ViCA2-init / README.md
nkkbr's picture
Create README.md
2b77ece verified
metadata
license: apache-2.0
tags:
  - multimodal
  - vision-language
  - video understanding
  - visuospatial cognition
  - spatial reasoning
  - vlm
  - llava
  - qwen
  - siglip
  - hiera
  - sam2
  - dual-encoder
language:
  - en
library_name: transformers
pipeline_tag: video-text-to-text
model_name: ViCA2-7B-Init

Usage and Full Documentation

For detailed model description, training setup, datasets, evaluation results, and inference code, please refer to the following links:

GitHub

Hugging Face Models