|
--- |
|
tags: |
|
- kernel |
|
--- |
|
|
|
# Optimizer |
|
|
|
Optimizer is a python package that provides: |
|
- PyTorch implementation of recent optimizer algorithms |
|
- with support for parallelism techniques for efficient large-scale training. |
|
|
|
### Currently implemented |
|
- [Parallel Muon with FSDP2](./docs/muon/parallel_muon.pdf) |
|
|
|
## Usage |
|
|
|
```python |
|
import torch |
|
from torch.distributed.fsdp import FullyShardedDataParallel as FSDP |
|
from kernels import get_kernel |
|
|
|
optimizer = get_kernel("motif-technologies/optimizer") |
|
|
|
model = None # your model here |
|
fsdp_model = FSDP(model) |
|
|
|
optim = optimizer.Muon( |
|
fsdp_model.parameters(), |
|
lr=0.01, |
|
momentum=0.9, |
|
weight_decay=1e-4, |
|
) |
|
``` |
|
|