wayyresearch
/

aetheris

Text Generation

state-space-model

mixture-of-experts

Mixture of Experts

knowledge-distillation

hybrid-architecture

Model card Files Files and versions

8.14 GB

Ctrl+K

Ctrl+K

1 contributor

History: 59 commits

rcgalbo's picture

Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab

9ef2709 verified 5 days ago

aetheris
Sync latest aetheris source code 7 days ago
tokenizer
Add Aya tokenizer files (avoid gated repo dependency) 7 days ago
.gitattributes
1.58 kB
Add Aya tokenizer files (avoid gated repo dependency) 7 days ago
README.md
9.19 kB
Update model card with full architecture and training details 7 days ago
config.yaml
316 Bytes
Full vocab config for SFT model 7 days ago
pytorch_model.pt
2.15 GB
xet

Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab 5 days ago
stage1_checkpoint.pt
1.64 GB
xet

Stage 1 checkpoint: [Step 50/20000] loss=7.7500 11 days ago
stage1_metadata.json
414 Bytes
Stage 1 checkpoint: [Step 50/20000] loss=7.7500 11 days ago
stage2_best.pt
1.44 GB
xet

Upload final Stage 2 best checkpoint (loss=2.7305, 20K steps) 10 days ago
stage2_checkpoint.pt
1.44 GB
xet

Stage 2 checkpoint: [Step 18500/20000] loss=3.1250 11 days ago
stage2_final.pt
1.44 GB
xet

Upload Stage 2 final checkpoint (step 20000) 10 days ago
stage2_metadata.json
263 Bytes
Update Stage 2 metadata: COMPLETE, best loss=2.7305 10 days ago
student_config.yaml
668 Bytes
Stage 1 initial: step 1000, loss=0.29, cka=0.60 11 days ago
training_config.yaml
2.74 kB
Stage 1 initial: step 1000, loss=0.29, cka=0.60 11 days ago