README.md · nkkbr/ViCA2-thinkng at main

metadata

license: apache-2.0
tags:
  - multimodal
  - vision-language
  - video understanding
  - visuospatial cognition
  - spatial reasoning
  - vlm
  - llava
  - qwen
  - siglip
  - hiera
  - sam2
  - dual-encoder
datasets:
  - nkkbr/ViCA-thinking-2.68k
language:
  - en
library_name: transformers
pipeline_tag: video-text-to-text
model_name: ViCA2-7B-Thinking

Usage and Full Documentation

For detailed model description, training setup, datasets, evaluation results, and inference code, please refer to the following links: