Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
wayyresearch
/
aetheris
like
2
Follow
Wayy Research Co.
2
Text Generation
PyTorch
65 languages
mamba
ssm
state-space-model
mixture-of-experts
Mixture of Experts
multilingual
distillation
knowledge-distillation
aya
hybrid-architecture
wayy-research
arxiv:
2312.00752
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
aetheris
8.14 GB
Ctrl+K
Ctrl+K
1 contributor
History:
59 commits
rcgalbo
Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab
9ef2709
verified
5 days ago
aetheris
Sync latest aetheris source code
7 days ago
tokenizer
Add Aya tokenizer files (avoid gated repo dependency)
7 days ago
.gitattributes
1.58 kB
Add Aya tokenizer files (avoid gated repo dependency)
7 days ago
README.md
9.19 kB
Update model card with full architecture and training details
7 days ago
config.yaml
316 Bytes
Full vocab config for SFT model
7 days ago
pytorch_model.pt
2.15 GB
xet
Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab
5 days ago
stage1_checkpoint.pt
1.64 GB
xet
Stage 1 checkpoint: [Step 50/20000] loss=7.7500
11 days ago
stage1_metadata.json
414 Bytes
Stage 1 checkpoint: [Step 50/20000] loss=7.7500
11 days ago
stage2_best.pt
1.44 GB
xet
Upload final Stage 2 best checkpoint (loss=2.7305, 20K steps)
10 days ago
stage2_checkpoint.pt
1.44 GB
xet
Stage 2 checkpoint: [Step 18500/20000] loss=3.1250
11 days ago
stage2_final.pt
1.44 GB
xet
Upload Stage 2 final checkpoint (step 20000)
10 days ago
stage2_metadata.json
263 Bytes
Update Stage 2 metadata: COMPLETE, best loss=2.7305
10 days ago
student_config.yaml
668 Bytes
Stage 1 initial: step 1000, loss=0.29, cka=0.60
11 days ago
training_config.yaml
2.74 kB
Stage 1 initial: step 1000, loss=0.29, cka=0.60
11 days ago