README.md · nkkbr/ViCA2-init at main

ViCA2-init / README.md

nkkbr

Create README.md

2b77ece verified 4 months ago

preview code

raw

history blame contribute delete

747 Bytes

metadata

license: apache-2.0
tags:
  - multimodal
  - vision-language
  - video understanding
  - visuospatial cognition
  - spatial reasoning
  - vlm
  - llava
  - qwen
  - siglip
  - hiera
  - sam2
  - dual-encoder
language:
  - en
library_name: transformers
pipeline_tag: video-text-to-text
model_name: ViCA2-7B-Init

Usage and Full Documentation

For detailed model description, training setup, datasets, evaluation results, and inference code, please refer to the following links: