multimodal ๐ฌ๐ผ๏ธ > new moondream (VLM) is out: it's 4-bit quantized (with QAT) version of moondream-2b, runs on 2.5GB VRAM at 184 tps with only 0.6% drop in accuracy (OS) ๐ > ByteDance released BAGEL-7B, an omni model that understands and generates both image + text. they also released Dolphin, a document parsing VLM ๐ฌ (OS) > Google DeepMind dropped MedGemma in I/O, VLM that can interpret medical scans, and Gemma 3n, an omni model with competitive LLM performance
> MMaDa is a new 8B diffusion language model that can generate image and text
LLMs > Mistral released Devstral, a 24B coding assistant (OS) ๐ฉ๐ปโ๐ป > Fairy R1-32B is a new reasoning model -- distilled version of DeepSeek-R1-Distill-Qwen-32B (OS) > NVIDIA released ACEReason-Nemotron-14B, new 14B math and code reasoning model > sarvam-m is a new Indic LM with hybrid thinking mode, based on Mistral Small (OS) > samhitika-0.0.1 is a new Sanskrit corpus (BookCorpus translated with Gemma3-27B)
image generation ๐จ > MTVCrafter is a new human motion animation generator