Auron-1.1B (Archived β Scaling Wall)
Note: This model demonstrates a scaling limitation in Ouroboros weight sharing. Despite 4x more parameters than the 279M, it converges to nearly identical val_loss (3.180 vs 3.188). At dim=2048 with head_dim=64, the representation is wide enough for a single pass β shared loops become an echo chamber rather than iterative refinement.
For inference and testing, use Auron-510M (val_loss 3.035).
| Model | Params | Final Val Loss | Scaling |
|---|---|---|---|
| Auron-279M | 279M | 3.188 | Baseline |
| Auron-510M | 510M | 3.035 | -0.153 (good) |
| Auron-1.1B | 1.1B | 3.180 | +0.145 (regression) |
Paper: Auron | Code: github.com/Fy-/Auron | Blog: HuggingFace
The Scaling Wall
- Root cause: Representation saturation at dim=2048 β loops add no new information
- Contributing: head_dim=64 produces 32 fragmented attention heads (Qwen 3.5 uses 256)
- Fix in progress: Chimera 1B v2 (head_dim=128) + Chimera-MoE (routed experts)
Architecture
- Type: Chimera (6 bottom + 6Γ3 top = 24 virtual)
- Dim: 2048, head_dim=64, expand_v=2
- Params: 1.1B (761M unique + 311M embed)
- Trained: 250K steps, 5B tokens, WSD schedule
from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M") # Use 510M
generate(model, tokenizer, device, "The history of")
Built by Florian Gasquez (@nyxia). Part of Soulkyn.
- Downloads last month
- 209
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
