view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts kashif, stas • Mar 9 • 27
view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate +2 mirinflim, aldopareja, muellerzr, stas • Jun 13, 2024 • 62
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Langage Model +9 HugoLaurencon, davanstrien, stas, Leyo, SaulLu, TimeRobber, skaramcheti, aps, giadap, yjernite, VictorSanh • Aug 22, 2023 • 37
view article Article Incredibly Fast BLOOM Inference with DeepSpeed and Accelerate stas, sgugger • Sep 16, 2022 • 1
view article Article Fit More and Train Faster With ZeRO via DeepSpeed and FairScale stas • Jan 19, 2021 • 5