YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

AGILLM-3 Large (698M)

AR+SAT Joint Training β€” Novel architecture training both autoregressive and semi-autoregressive heads simultaneously, enabling faster parallel inference.

Model Details

Parameter Value
Parameters 698M
Architecture Transformer with Expansion Rank
d_model 1024
Layers 24
Heads 16
Expansion Rank 128 (2x ratio)
Tokenizer DeepSeek-V3.2 (128,815 vocab)
Training Target 35.76B tokens (51.2x Chinchilla)
Context Length 1122 tokens

Training

# Minimal run (uses sane defaults)
python n.py train --preset large

# Resume from checkpoint
python n.py train --preset large --resume ckpts/latest.pt

# Inference
python n.py infer --mode ar --ckpt ckpts/pretrain_step00176907.pt --prompt "Hello" --max_new 100

Defaults Baked In

  • --max_ckpts 3 β€” Auto-prune old checkpoints
  • --chilla_max_double True β€” Double Chinchilla (51.2x tokens)
  • --after_sft_steps 80000 β€” 80K SFT steps with chat format
  • Auto HF upload on each checkpoint save

Hot Config

Edit hot_config.json mid-training without restart:

{"save_every_sec": 43200, "pause_training": false}

Files

  • n.py β€” Main trainer with AR+SAT joint training
  • rotating_log.py β€” Dual rotating log
  • hf_upload.py β€” Checkpoint uploader
  • tokenizer/ β€” DeepSeek-V3.2 tokenizer

License

Apache 2.0

Author

OpenTransformers Ltd (UK Company #16940923)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using OpenTransformer/AGILLM-3-large 1