π€ Hugging Face | π€ ModelScope | π Experience Now
Introduction
Ling-1T is the first flagship non-thinking model in the Ling 2.0 series, featuring 1 trillion total parameters with β 50 billion active parameters per token. Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of efficient reasoning and scalable cognition.
Pre-trained on 20 trillion+ high-quality, reasoning-dense tokens, Ling-1T-base supports up to 128K context length and adopts an evolutionary chain-of-thought (Evo-CoT) process across mid-training and post-training. This curriculum greatly enhances the modelβs efficiency and reasoning depth, allowing Ling-1T to achieve state-of-the-art performance on multiple complex reasoning benchmarksβbalancing accuracy and efficiency.
Flagship-Level Efficient Reasoning
We comprehensively evaluated Ling-1T against leading flagship models, including both open-source giants (e.g., DeepSeek-V3.1-Terminus, Kimi-K2-Instruct-0905) and closed-source APIs (GPT-5-main, Gemini-2.5-Pro). Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates superior complex reasoning ability and overall advantage.
In the AIME 25 benchmark, Ling-1T extends the Pareto frontier of reasoning accuracy vs. reasoning length, showcasing its strength in βefficient thinking and precise reasoning.β
Aesthetic Understanding and Front-End Generation
Ling-1T excels in visual reasoning and front-end code generation tasks, combining deep semantic understanding with precise code synthesis. We introduce a hybrid SyntaxβFunctionβAesthetics reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of visual aesthetics. On ArtifactsBench, Ling-1T ranks first among open-source models, and the benchmark visualizations in this card were, in fact, generated by Ling-1T itself.
Emergent Intelligence at Trillion-Scale
Scaling to the trillion-parameter level has revealed strong emergent reasoning and transfer capabilities. For example, in the BFCL V3 tool-use benchmark, Ling-1T achieves β 70% tool-call accuracy with only light instruction tuningβdespite having seen no large-scale trajectory data during training. Ling-1T can:
- Interpret complex natural-language instructions
- Transform abstract logic into functional visual components
- Generate cross-platform compatible front-end code
- Create stylistically controlled marketing copy and multi-lingual text
These capabilities form the foundation for general, collaborative humanβAI intelligence, which we aim to advance together with the open-source community through Ling-1Tβs release.
Pre-Training at Trillion Scale
The Ling 2.0 architecture was designed from the ground up for trillion-scale efficiency, guided by the Ling Scaling Law (arXiv:2507.17702). This ensures architectural and hyperparameter scalability even under 1e25β1e26 FLOPs of compute.
Key architectural innovations include:
- 1T total / 50B active parameters with a 1/32 MoE activation ratio
- MTP layers for enhanced compositional reasoning
- Aux-loss-free, sigmoid-scoring expert routing with zero-mean updates
- QK Normalization for fully stable convergence
Ling-1T is the largest FP8-trained foundation model known to date. FP8 mixed-precision training yields 15%+ end-to-end speedup, improved memory efficiency, and maintains β€ 0.1% loss deviation from BF16 across 1T tokens. A fine-grained, heterogeneous 1F1B interleaved pipeline further boosts utilization by 40 %+. System-level optimizationsβfused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetryβensure stable trillion-scale training.
Pre-training used over 20T high-quality tokens, with > 40% reasoning-dense data in later stages. Mid-training introduced curated chain-of-thought corpora for βreasoning pre-activationβ, improving downstream reasoning stability. A custom WSM (WarmupβStableβMerge) LR schedulerοΌarXiv:2507.17634οΌ with mid-train checkpoint merging simulates LR decay and boosts generalization.
Post-Training and Evo-CoT Optimization
Built upon mid-training reasoning activation, post-training adopts Evo-CoT (Evolutionary Chain-of-Thought) for progressive reasoning enhancement under controllable cost. This approach continually expands the Pareto frontier of reasoning accuracy vs. efficiencyβideal for reflexive non-thinking models.
For reinforcement learning, we introduce LPO (Linguistics-Unit Policy Optimization) βa novel sentence-level policy optimization method. Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats sentences as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior. Empirically, LPO offers superior training stability and generalization across reasoning tasks.
Evaluation
Ling-1T has been extensively evaluated across knowledge, code, math, reasoning, agent, and alignment benchmarks. It currently stands as the best open-source flagship non-thinking model, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.
Model Downloads
You can download Ling-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.
Model | Context Length | Download |
---|---|---|
Ling-1T | 32K -> 128K (YaRN) | π€ HuggingFace π€ ModelScope |
Note: If you are interested in previous version, please visit the past model collections in Huggingface or ModelScope.
Quickstart
π Try Online
You can experience Ling-1T online at: ZenMux
π API Usage
You can also use Ling-1T through API calls:
from openai import OpenAI
# 1. Initialize the OpenAI client
client = OpenAI(
# 2. Point the base URL to the ZenMux endpoint
base_url="https://zenmux.ai/api/v1",
# 3. Replace with the API Key from your ZenMux user console
api_key="<your ZENMUX_API_KEY>",
)
# 4. Make a request
completion = client.chat.completions.create(
# 5. Specify the model to use in the format "provider/model-name"
model="inclusionai/ling-1t",
messages=[
{
"role": "user",
"content": "What is the meaning of life?"
}
]
)
print(completion.choices[0].message.content)
Deployment
SGLang
Environment Preparation
We will later submit our model to the SGLang official release. Now we can prepare the environment by following these steps:
pip3 install -U sglang sgl-kernel
Run Inference
Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.
Here is the example to run Ling-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:
- Start server:
# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0
# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1
# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2
# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3
# This is only an example. Please adjust arguments according to your actual environment.
- Client:
curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
More usage can be found here
vLLM
Environment Preparation
pip install vllm==0.11.0
Run Inference:
Here is the example to deploy the model with multiple GPU nodes, where the master node IP is ${MASTER_IP}, server port is ${PORT} and the path of model is ${MODEL_PATH}:
# step 1. start ray on all nodes
# step 2. start vllm server only on node 0:
vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 8 --pipeline-parallel-size 4 --gpu-memory-utilization 0.85
# This is only an example, please adjust arguments according to your actual environment.
To handle long context in vLLM using YaRN, we need to follow these two steps:
- Add a
rope_scaling
field to the model'sconfig.json
file, for example:
{
...,
"rope_scaling": {
"factor": 4.0,
"original_max_position_embeddings": 32768,
"type": "yarn"
}
}
- Use an additional parameter
--max-model-len
to specify the desired maximum context length when starting the vLLM service.
For detailed guidance, please refer to the vLLM instructions
.
Limitations & Future Plans
While Ling-1T has made strong progress in efficient reasoning, cross-domain generalization, and training efficiency, several limitations remain:
- GQA-based attention: stable for long-context reasoning but relatively costly. Future versions will adopt hybrid attention to improve efficiency.
- Limited agentic ability: current model has room to grow in multi-turn interaction, long-term memory, and tool use.
- Instruction and identity issues: occasional deviations or role confusion may occur; future updates will enhance alignment and consistency.
The future versions of Ling-1T will continue to evolve in architecture, reasoning, and alignment, advancing the series toward more general intelligence.
License
This code repository is licensed under the MIT License.
- Downloads last month
- 1,636