YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
AGILLM-3 Large (698M)
AR+SAT Joint Training β Novel architecture training both autoregressive and semi-autoregressive heads simultaneously, enabling faster parallel inference.
Model Details
| Parameter | Value |
|---|---|
| Parameters | 698M |
| Architecture | Transformer with Expansion Rank |
| d_model | 1024 |
| Layers | 24 |
| Heads | 16 |
| Expansion Rank | 128 (2x ratio) |
| Tokenizer | DeepSeek-V3.2 (128,815 vocab) |
| Training Target | 35.76B tokens (51.2x Chinchilla) |
| Context Length | 1122 tokens |
Training
# Minimal run (uses sane defaults)
python n.py train --preset large
# Resume from checkpoint
python n.py train --preset large --resume ckpts/latest.pt
# Inference
python n.py infer --mode ar --ckpt ckpts/pretrain_step00176907.pt --prompt "Hello" --max_new 100
Defaults Baked In
--max_ckpts 3β Auto-prune old checkpoints--chilla_max_double Trueβ Double Chinchilla (51.2x tokens)--after_sft_steps 80000β 80K SFT steps with chat format- Auto HF upload on each checkpoint save
Hot Config
Edit hot_config.json mid-training without restart:
{"save_every_sec": 43200, "pause_training": false}
Files
n.pyβ Main trainer with AR+SAT joint trainingrotating_log.pyβ Dual rotating loghf_upload.pyβ Checkpoint uploadertokenizer/β DeepSeek-V3.2 tokenizer
License
Apache 2.0
Author
OpenTransformers Ltd (UK Company #16940923)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support