Auron

Auron-1.1B (Archived β€” Scaling Wall)

Note: This model demonstrates a scaling limitation in Ouroboros weight sharing. Despite 4x more parameters than the 279M, it converges to nearly identical val_loss (3.180 vs 3.188). At dim=2048 with head_dim=64, the representation is wide enough for a single pass β€” shared loops become an echo chamber rather than iterative refinement.

For inference and testing, use Auron-510M (val_loss 3.035).

Model Params Final Val Loss Scaling
Auron-279M 279M 3.188 Baseline
Auron-510M 510M 3.035 -0.153 (good)
Auron-1.1B 1.1B 3.180 +0.145 (regression)

Paper: Auron | Code: github.com/Fy-/Auron | Blog: HuggingFace

The Scaling Wall

  • Root cause: Representation saturation at dim=2048 β€” loops add no new information
  • Contributing: head_dim=64 produces 32 fragmented attention heads (Qwen 3.5 uses 256)
  • Fix in progress: Chimera 1B v2 (head_dim=128) + Chimera-MoE (routed experts)

Architecture

  • Type: Chimera (6 bottom + 6Γ—3 top = 24 virtual)
  • Dim: 2048, head_dim=64, expand_v=2
  • Params: 1.1B (761M unique + 311M embed)
  • Trained: 250K steps, 5B tokens, WSD schedule
from ouro import load_model, generate
model, tokenizer, device = load_model("nyxia/Auron-510M")  # Use 510M
generate(model, tokenizer, device, "The history of")

Built by Florian Gasquez (@nyxia). Part of Soulkyn.

Downloads last month
209
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support