view article Article TRL v1.0: Post-Training Library Built to Move with the Field +2 10 days ago • 45
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 290
view article Article SmolVLM Grows Smaller – Introducing the 256M & 500M Models! +1 Jan 23, 2025 • 192