3rd-Order Continuous LLM 500M

A 500M parameter language model with 3rd-order continuous dynamics.

Non-standard architecture. Custom inference runtime required.

Overview

Parameters: ~500M
Hidden size: 1024
Layers: 28
Attention: 16 query heads / 4 KV heads
MLP size: 4096
Vocabulary size: 151643
Tokenizer family: Qwen2.5 tokenizer vocabulary

Public Architecture Features

RoPE positional encoding
RMSNorm
Grouped Query Attention (16Q / 4KV)
SiLU MLP
bfloat16 weights

Usage

This repository publishes weights only.

It is not expected to run with standard Hugging Face AutoModel pipelines.

The reason is that this model does not follow a standard separation between inference and training. It uses an endogenous control regime without the usual loss-driven runtime split. In short: inference is training. Conceptually this is closer to a TTT-like family of ideas than to a standard frozen LLM runtime, but the mechanism and goals here are different.

At the moment, only these public details are released. If you are interested in higher-order ODE LLMs, request API access, or want to discuss custom runtime/code access, contact:

2218038150@qq.com
a2218038150@gmail.com

Downloads last month: 58

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support