view article Article Vision Language Model Alignment in TRL ⚡️ By sergiopaniego and 4 others • 2 days ago • 32
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • May 21 • 201
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 39
Pixels, Patterns, but No Poetry: To See The World like Humans Paper • 2507.16863 • Published 18 days ago • 66
Typhoon 2: A Family of Open Text and Multimodal Thai Large Language Models Paper • 2412.13702 • Published Dec 18, 2024 • 1
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published 22 days ago • 233
view article Article 🤔👀🎬🖥️📖 Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other • Jun 21 • 66
view article Article Introducing ColQwen-Omni: Retrieve in every modality By manu and 4 others • 23 days ago • 64
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning Paper • 2507.05255 • Published Jul 7 • 68
How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks Paper • 2507.01955 • Published Jul 2 • 34
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 614
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published Jul 1 • 210
view article Article Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub By nvidia and 11 others • Jun 27 • 28
OpenThaiGPT 1.6 and R1: Thai-Centric Open Source and Reasoning Large Language Models Paper • 2504.01789 • Published Apr 2 • 2
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding Paper • 2506.16035 • Published Jun 19 • 86
Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques Paper • 2506.08060 • Published Jun 9 • 8