
Qwen/Qwen2.5-VL-7B-Instruct
Image-Text-to-Text
•
8B
•
Updated
•
3.31M
•
•
1.31k
Multimodal models that take image + text as input and produce natural language output. Use cases: chart QA, visual document reasoning, VQA.