AndesVL is a suite of mobile-optimized Multimodal Large Language Models (MLLMs) with 0.6B to 4B parameters, built upon Qwen3's LLM and various visual encoders. Designed for efficient edge deployment, it achieves first-tier performance on diverse benchmarks, including those for text-rich tasks, reasoning tasks, Visual Question Answering (VQA), multi-image tasks, multilingual tasks, and GUI tasks. Its "1+N" LoRA architecture and QALFT framework facilitate efficient task adaptation and model compression, enabling a 6.7x peak decoding speedup and a 1.8 bits-per-weight compression ratio on mobile chips.
Detailed model sizes and components are provided below:
Model | Total Parameters (B) | Visual Encoder | LLM |
---|---|---|---|
AndesVL-0.6B | 0.695 | SigLIP2-Base | Qwen3-0.6B |
AndesVL-1B | 0.927 | AIMv2-Large | Qwen3-0.6B |
AndesVL-2B | 2.055 | AIMv2-Large | Qwen3-1.7B |
AndesVL-4B | 4.360 | AIMv2-Large | Qwen3-4B |
Quick Start
# require transformers>=4.52.4
import torch
from transformers import AutoModel, AutoTokenizer, CLIPImageProcessor
model_dir = "OPPOer/AndesVL-4B-Thinking"
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True,torch_dtype=torch.bfloat16).cuda()
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
image_processor = CLIPImageProcessor.from_pretrained(model_dir, trust_remote_code=True)
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "描述这张图片。"},
{
"type": "image_url",
"image_url": {
"url": "https://i-blog.csdnimg.cn/blog_migrate/2f4c88e71f7eabe46d062d2f1ec77d10.jpeg" # image/to/path
},
}
],
},
]
res = model.chat(messages, tokenizer, image_processor, max_new_tokens=1024, do_sample=True, temperature=0.6, Thinking=True)
print(res)
Citation
If you find our work helpful, feel free to give us a cite.
@misc{jin2025andesvltechnicalreportefficient,
title={AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model},
author={AndesVL Team, OPPO AI Center},
year={2025},
eprint={2510.11496},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.11496},
}
Acknowledge
We are very grateful for the efforts of the Qwen, AimV2 and Siglip 2 projects.
- Downloads last month
- 84