Qwen/Qwen3-VL-30B-A3B-Instruct Image-Text-to-Text β’ 31B β’ Updated Nov 26, 2025 β’ 941k β’ β’ 507
Running Featured 558 Vision Arena (Testing VLMs side-by-side) πΌ 558 Display image analysis results
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper β’ 2511.06221 β’ Published Nov 9, 2025 β’ 132
Running on CPU Upgrade Featured 2.89k The Smol Training Playbook π 2.89k The secrets to building world-class LLMs
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 β’ 7 items β’ Updated 20 days ago β’ 161
yayayaaa/florence-2-large-ft-moredetailed Image-to-Text β’ 0.8B β’ Updated Dec 13, 2025 β’ 88 β’ 15
meta-llama/Llama-3.2-11B-Vision Image-Text-to-Text β’ 11B β’ Updated Sep 27, 2024 β’ 9.87k β’ 578
Runtime error Featured 515 Florence2 + SAM2 π₯ 515 Segment and caption objects in images and videos