3rd-Order Continuous LLM 500M

A 500M parameter language model with 3rd-order continuous dynamics.

Non-standard architecture. Custom inference runtime required.

Overview

  • Parameters: ~500M
  • Hidden size: 1024
  • Layers: 28
  • Attention: 16 query heads / 4 KV heads
  • MLP size: 4096
  • Vocabulary size: 151643
  • Tokenizer family: Qwen2.5 tokenizer vocabulary

Public Architecture Features

  • RoPE positional encoding
  • RMSNorm
  • Grouped Query Attention (16Q / 4KV)
  • SiLU MLP
  • bfloat16 weights

Usage

This repository publishes weights only.

It is not expected to run with standard Hugging Face AutoModel pipelines.

The reason is that this model does not follow a standard separation between inference and training. It uses an endogenous control regime without the usual loss-driven runtime split. In short: inference is training. Conceptually this is closer to a TTT-like family of ideas than to a standard frozen LLM runtime, but the mechanism and goals here are different.

At the moment, only these public details are released. If you are interested in higher-order ODE LLMs, request API access, or want to discuss custom runtime/code access, contact:

  • 2218038150@qq.com
  • a2218038150@gmail.com
Downloads last month
58
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support