Post
93
Genesis 1B is now public. 🔥
I’m training a 1.003B parameter model from scratch on 2× RTX 4090s and opened a public playground for early checkpoints.
The real bottleneck wasn’t training.
It was checkpointing:
FSDP full-state gather over PCIe = NCCL timeout hell
Switching to DCP sharded checkpoints changed the trajectory of the run.
- Playground: rob-x-ai/genesis-1b-playground
- Write-up: https://kroonen.ai/blog/distributed-checkpoint-failures-rtx4090/
I’m training a 1.003B parameter model from scratch on 2× RTX 4090s and opened a public playground for early checkpoints.
The real bottleneck wasn’t training.
It was checkpointing:
FSDP full-state gather over PCIe = NCCL timeout hell
Switching to DCP sharded checkpoints changed the trajectory of the run.
- Playground: rob-x-ai/genesis-1b-playground
- Write-up: https://kroonen.ai/blog/distributed-checkpoint-failures-rtx4090/