3rd-Order Continuous LLM 500M
A 500M parameter language model with 3rd-order continuous dynamics.
Non-standard architecture. Custom inference runtime required.
Overview
- Parameters: ~500M
- Hidden size: 1024
- Layers: 28
- Attention: 16 query heads / 4 KV heads
- MLP size: 4096
- Vocabulary size: 151643
- Tokenizer family: Qwen2.5 tokenizer vocabulary
Public Architecture Features
- RoPE positional encoding
- RMSNorm
- Grouped Query Attention (16Q / 4KV)
- SiLU MLP
- bfloat16 weights
Usage
This repository publishes weights only.
It is not expected to run with standard Hugging Face AutoModel pipelines.
The reason is that this model does not follow a standard separation between inference and training. It uses an endogenous control regime without the usual loss-driven runtime split. In short: inference is training. Conceptually this is closer to a TTT-like family of ideas than to a standard frozen LLM runtime, but the mechanism and goals here are different.
At the moment, only these public details are released. If you are interested in higher-order ODE LLMs, request API access, or want to discuss custom runtime/code access, contact:
2218038150@qq.coma2218038150@gmail.com
- Downloads last month
- 58