Spaces:
Running
Running
metadata
title: README
emoji: π
colorFrom: red
colorTo: red
sdk: static
pinned: false
HF PRESS

The Ultra-Scale Playbook: Training LLMs on GPU Clusters
Essential reading for anyone scaling ML infrastructure
The knowledge on how to efficiently scale training to large GPU clusters has been well kept within a handful
big industry labs. With this book, we set out to lift the veil and release a comprehensive resource on
distributed training.
AUTHORS
Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro von Werra, Thomas Wolf
Nouamane Tazi, Ferdinand Mom, Haojun Zhao, Phuc Nguyen, Mohamed Mekkouri, Leandro von Werra, Thomas Wolf
AFFILIATION
Hugging Face
Hugging Face
PUBLISHED
Jul 30, 2025
Jul 30, 2025
This book PDF is accessible with
a
PRO
subscription.
(*If you experience issues downloading the PDF with Chrome try restarting/updating or use a different browser)
The Nanotron team focus on sharing open knowledge and developping open-source libraries for efficient distributed training of large-scale AI models.
Some of its contributions are:
- the Nanotron library
- the Picotron library
- the Ultrascale-Playbook, a comprehensive book covering all distributed/parallelisation and low-level techniques that can be used to efficiently train models at the largest scales.