Auron-1.1B (Archived — Scaling Wall)

Note: This model demonstrates a scaling limitation in Ouroboros weight sharing. Despite 4x more parameters than the 279M, it converges to nearly identical val_loss (3.180 vs 3.188). At dim=2048 with head_dim=64, the representation is wide enough for a single pass — shared loops become an echo chamber rather than iterative refinement.

For inference and testing, use Auron-510M (val_loss 3.035).

Model	Params	Final Val Loss	Scaling
Auron-279M	279M	3.188	Baseline
Auron-510M	510M	3.035	-0.153 (good)
Auron-1.1B	1.1B	3.180	+0.145 (regression)

Paper: Auron | Code: github.com/Fy-/Auron | Blog: HuggingFace

The Scaling Wall

Root cause: Representation saturation at dim=2048 — loops add no new information
Contributing: head_dim=64 produces 32 fragmented attention heads (Qwen 3.5 uses 256)
Fix in progress: Chimera 1B v2 (head_dim=128) + Chimera-MoE (routed experts)

Architecture

Type: Chimera (6 bottom + 6×3 top = 24 virtual)
Dim: 2048, head_dim=64, expand_v=2
Params: 1.1B (761M unique + 311M embed)
Trained: 250K steps, 5B tokens, WSD schedule

from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M")  # Use 510M
generate(model, tokenizer, device, "The history of")

Built by Florian Gasquez (@nyxia). Part of Soulkyn.

Downloads last month: 209

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support