--- license: apache-2.0 base_model: - DeepGlint-AI/rice-vit-large-patch14-560 - Qwen/Qwen3-4B-Instruct-2507 --- # LLaVA-OneVision-1.5-8B Initialization Model Card ## 🚀 Overview This model provides an initialization checkpoint for training **LLaVA-OneVision-1.5**, designed to combine strong language and vision capabilities. It integrates a powerful LLM and a state-of-the-art vision encoder, with a flexible adapter to enable efficient multimodal learning. ## 🏗️ Key Components - **Vision Encoder:** Uses the pretrained ViT model from [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560) to extract rich visual features. - **Adapter:** A randomly initialized adapter module with 4× token compression capability, enabling efficient fusion of image and text modalities. - **Language Model:** Incorporates the pretrained language model [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) for robust text understanding and generation. ## 📝 Usage This initialization checkpoint is intended for downstream training and fine-tuning. For usage and training scripts, please refer to the [EvolvingLMMs-Lab/LLaVA-OneVision-1.5 repository](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5). ## 📚 References - [DeepGlint-AI/rice-vit-large-patch14-560](https://huggingface.co/DeepGlint-AI/rice-vit-large-patch14-560) - [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) - [EvolvingLMMs-Lab/LLaVA-OneVision-1.5](https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5) ## Citation If you find *LLaVA-OneVision-1.5* useful in your research, please consider to cite the following related papers: ``` @misc{an2025llavaonevision15fullyopenframework, title={LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training}, author={Xiang An and Yin Xie and Kaicheng Yang and Wenkang Zhang and Xiuwei Zhao and Zheng Cheng and Yirui Wang and Songcen Xu and Changrui Chen and Chunsheng Wu and Huajie Tan and Chunyuan Li and Jing Yang and Jie Yu and Xiyao Wang and Bin Qin and Yumeng Wang and Zizhen Yan and Ziyong Feng and Ziwei Liu and Bo Li and Jiankang Deng}, year={2025}, eprint={2509.23661}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2509.23661}, } ``` ## ⚖️ License Apache 2.0