--- license: apache-2.0 base_model: - DeepGlint-AI/rice-vit-large-patch14-560 - Qwen/Qwen3-4B-Instruct-2507 --- # LLaVA-OneVision-1.5-8B Initialization Model Card ## 🚀 Overview This model provides an initialization checkpoint for training **LLaVA-OneVision-1.5**, designed to combine strong language and vision capabilities. It integrates a powerful LLM and a state-of-the-art vision encoder, with a flexible adapter to enable efficient multimodal learning. ## 🏗️ Key Components - **Vision Encoder:** Uses the pretrained ViT model from [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560) to extract rich visual features. - **Adapter:** A randomly initialized adapter module with 4× token compression capability, enabling efficient fusion of image and text modalities. - **Language Model:** Incorporates the pretrained language model [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) for robust text understanding and generation. ## 📝 Usage This initialization checkpoint is intended for downstream training and fine-tuning. For usage and training scripts, please refer to the [EvolvingLMMs-Lab/LLaVA-OneVision-1.5 repository](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5). ## 📚 References - [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560) - [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) - [EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5) ## ⚖️ License Apache 2.0