Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Paper • 2604.24954 • Published 7 days ago • 15
FlashDecoding++: Faster Large Language Model Inference on GPUs Paper • 2311.01282 • Published Nov 2, 2023 • 38
Liger Kernel: Efficient Triton Kernels for LLM Training Paper • 2410.10989 • Published Oct 14, 2024 • 3
FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference Paper • 2505.22758 • Published May 28, 2025 • 1
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published Feb 27 • 98
Running Featured 1.33k FineWeb: decanting the web for the finest text data at scale 🍷 1.33k Explore and download the FineWeb web‑text dataset
Running 3.83k The Ultra-Scale Playbook 🌌 3.83k The ultimate guide to training LLM on large GPU Clusters