---
license: apache-2.0
base_model:
- DeepGlint-AI/rice-vit-large-patch14-560
- Qwen/Qwen3-4B-Instruct-2507
---

# LLaVA-OneVision-1.5-8B Initialization Model Card

## 🚀 Overview

This model provides an initialization checkpoint for training **LLaVA-OneVision-1.5**, designed to combine strong language and vision capabilities. It integrates a powerful LLM and a state-of-the-art vision encoder, with a flexible adapter to enable efficient multimodal learning.

## 🏗️ Key Components

- **Vision Encoder:**  
  Uses the pretrained ViT model from [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560) to extract rich visual features.

- **Adapter:**  
  A randomly initialized adapter module with 4× token compression capability, enabling efficient fusion of image and text modalities.

- **Language Model:**  
  Incorporates the pretrained language model [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) for robust text understanding and generation.

## 📝 Usage

This initialization checkpoint is intended for downstream training and fine-tuning. For usage and training scripts, please refer to the [EvolvingLMMs-Lab/LLaVA-OneVision-1.5 repository](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5).

## 📚 References

- [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560)
- [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
- [EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5)

## ⚖️ License

Apache 2.0