Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

wayyresearch
/
aetheris

Text Generation
PyTorch
mamba
ssm
state-space-model
mixture-of-experts
Mixture of Experts
multilingual
distillation
knowledge-distillation
aya
hybrid-architecture
wayy-research
Model card Files Files and versions
xet
Community
aetheris
8.14 GB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 59 commits
rcgalbo's picture
rcgalbo
Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab
9ef2709 verified 5 days ago
  • aetheris
    Sync latest aetheris source code 7 days ago
  • tokenizer
    Add Aya tokenizer files (avoid gated repo dependency) 7 days ago
  • .gitattributes
    1.58 kB
    Add Aya tokenizer files (avoid gated repo dependency) 7 days ago
  • README.md
    9.19 kB
    Update model card with full architecture and training details 7 days ago
  • config.yaml
    316 Bytes
    Full vocab config for SFT model 7 days ago
  • pytorch_model.pt
    2.15 GB
    xet
    Stage 3 SFT best (step 3550, loss 3.6461) - pruned 80K vocab 5 days ago
  • stage1_checkpoint.pt
    1.64 GB
    xet
    Stage 1 checkpoint: [Step 50/20000] loss=7.7500 11 days ago
  • stage1_metadata.json
    414 Bytes
    Stage 1 checkpoint: [Step 50/20000] loss=7.7500 11 days ago
  • stage2_best.pt
    1.44 GB
    xet
    Upload final Stage 2 best checkpoint (loss=2.7305, 20K steps) 10 days ago
  • stage2_checkpoint.pt
    1.44 GB
    xet
    Stage 2 checkpoint: [Step 18500/20000] loss=3.1250 11 days ago
  • stage2_final.pt
    1.44 GB
    xet
    Upload Stage 2 final checkpoint (step 20000) 10 days ago
  • stage2_metadata.json
    263 Bytes
    Update Stage 2 metadata: COMPLETE, best loss=2.7305 10 days ago
  • student_config.yaml
    668 Bytes
    Stage 1 initial: step 1000, loss=0.29, cka=0.60 11 days ago
  • training_config.yaml
    2.74 kB
    Stage 1 initial: step 1000, loss=0.29, cka=0.60 11 days ago