πŸ€— Hugging Face   |   πŸ€– ModelScope    |   πŸ™ Experience Now

Introduction

Ling-1T is the first flagship non-thinking model in the Ling 2.0 series, featuring 1 trillion total parameters with β‰ˆ 50 billion active parameters per token. Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of efficient reasoning and scalable cognition.

Pre-trained on 20 trillion+ high-quality, reasoning-dense tokens, Ling-1T-base supports up to 128K context length and adopts an evolutionary chain-of-thought (Evo-CoT) process across mid-training and post-training. This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve state-of-the-art performance on multiple complex reasoning benchmarksβ€”balancing accuracy and efficiency.

Flagship-Level Efficient Reasoning

We comprehensively evaluated Ling-1T against leading flagship models, including both open-source giants (e.g., DeepSeek-V3.1-Terminus, Kimi-K2-Instruct-0905) and closed-source APIs (GPT-5-main, Gemini-2.5-Pro). Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates superior complex reasoning ability and overall advantage.

In the AIME 25 benchmark, Ling-1T extends the Pareto frontier of reasoning accuracy vs. reasoning length, showcasing its strength in β€œefficient thinking and precise reasoning.”

Aesthetic Understanding and Front-End Generation

Ling-1T excels in visual reasoning and front-end code generation tasks, combining deep semantic understanding with precise code synthesis. We introduce a hybrid Syntax–Function–Aesthetics reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of visual aesthetics. On ArtifactsBench, Ling-1T ranks first among open-source models, and the benchmark visualizations in this card were, in fact, generated by Ling-1T itself.

Emergent Intelligence at Trillion-Scale

Scaling to the trillion-parameter level has revealed strong emergent reasoning and transfer capabilities. For example, in the BFCL V3 tool-use benchmark, Ling-1T achieves β‰ˆ 70% tool-call accuracy with only light instruction tuningβ€”despite having seen no large-scale trajectory data during training. Ling-1T can:

  • Interpret complex natural-language instructions
  • Transform abstract logic into functional visual components
  • Generate cross-platform compatible front-end code
  • Create stylistically controlled marketing copy and multi-lingual text

These capabilities form the foundation for general, collaborative human–AI intelligence, which we aim to advance together with the open-source community through Ling-1T’s release.

Pre-Training at Trillion Scale

The Ling 2.0 architecture was designed from the ground up for trillion-scale efficiency, guided by the Ling Scaling Law (arXiv:2507.17702). This ensures architectural and hyperparameter scalability even under 1e25–1e26 FLOPs of compute.

Key architectural innovations include:

  • 1T total / 50B active parameters with a 1/32 MoE activation ratio
  • MTP layers for enhanced compositional reasoning
  • Aux-loss-free, sigmoid-scoring expert routing with zero-mean updates
  • QK Normalization for fully stable convergence

Ling-1T is the largest FP8-trained foundation model known to date. FP8 mixed-precision training yields 15%+ end-to-end speedup, improved memory efficiency, and maintains ≀ 0.1% loss deviation from BF16 across 1T tokens. A fine-grained, heterogeneous 1F1B interleaved pipeline further boosts utilization by 40 %+. System-level optimizationsβ€”fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetryβ€”ensure stable trillion-scale training.

Pre-training used over 20T high-quality tokens, with > 40% reasoning-dense data in later stages. Mid-training introduced curated chain-of-thought corpora for β€œreasoning pre-activation”, improving downstream reasoning stability. A custom WSM (Warmup–Stable–Merge) LR scheduler(arXiv:2507.17634οΌ‰ with mid-train checkpoint merging simulates LR decay and boosts generalization.

Post-Training and Evo-CoT Optimization

Built upon mid-training reasoning activation, post-training adopts Evo-CoT (Evolutionary Chain-of-Thought) for progressive reasoning enhancement under controllable cost. This approach continually expands the Pareto frontier of reasoning accuracy vs. efficiencyβ€”ideal for reflexive non-thinking models.

For reinforcement learning, we introduce LPO (Linguistics-Unit Policy Optimization) β€”a novel sentence-level policy optimization method. Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats sentences as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior. Empirically, LPO offers superior training stability and generalization across reasoning tasks.

Evaluation

Ling-1T has been extensively evaluated across knowledge, code, math, reasoning, agent, and alignment benchmarks. It currently stands as the best open-source flagship non-thinking model, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.

Model Downloads

You can download Ling-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.

Model Context Length Download
Ling-1T 32K -> 128K (YaRN) πŸ€— HuggingFace    πŸ€– ModelScope

Note: If you are interested in previous version, please visit the past model collections in Huggingface or ModelScope.

Quickstart

πŸš€ Try Online

You can experience Ling-1T online at: ZenMux

πŸ”Œ API Usage

You can also use Ling-1T through API calls:

from openai import OpenAI

# 1. Initialize the OpenAI client
client = OpenAI(
    # 2. Point the base URL to the ZenMux endpoint
    base_url="https://zenmux.ai/api/v1",
    # 3. Replace with the API Key from your ZenMux user console
    api_key="<your ZENMUX_API_KEY>",
)

# 4. Make a request
completion = client.chat.completions.create(
    # 5. Specify the model to use in the format "provider/model-name"
    model="inclusionai/ling-1t",
    messages=[
        {
            "role": "user",
            "content": "What is the meaning of life?"
        }
    ]
)

print(completion.choices[0].message.content)

Deployment

SGLang

Environment Preparation

We will later submit our model to the SGLang official release. Now we can prepare the environment by following these steps:

pip3 install -U sglang sgl-kernel

Run Inference

Both BF16 and FP8 models are supported by SGLang now. It depends on the dtype of the model in ${MODEL_PATH}.

Here is the example to run Ling-1T with multiple GPU nodes, where the master node IP is ${MASTER_IP} and server port is ${PORT}:

  • Start server:
# Node 0:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 0 

# Node 1:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 1 

# Node 2:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 2 

# Node 3:
python -m sglang.launch_server --model-path $MODEL_PATH --tp-size 8 --pp-size 4 --dp-size 1 --trust-remote-code --dist-init-addr $MASTER_IP:2345 --port $PORT --nnodes 4 --node-rank 3

# This is only an example. Please adjust arguments according to your actual environment.
  • Client:
curl -s http://${MASTER_IP}:${PORT}/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'

More usage can be found here

vLLM

Environment Preparation

pip install vllm==0.11.0

Run Inference:

Here is the example to deploy the model with multiple GPU nodes, where the master node IP is ${MASTER_IP}, server port is ${PORT} and the path of model is ${MODEL_PATH}:

# step 1. start ray on all nodes

# step 2. start vllm server only on node 0:
vllm serve $MODEL_PATH --port $PORT --served-model-name my_model --trust-remote-code --tensor-parallel-size 8 --pipeline-parallel-size 4 --gpu-memory-utilization 0.85

# This is only an example, please adjust arguments according to your actual environment.

To handle long context in vLLM using YaRN, we need to follow these two steps:

  1. Add a rope_scaling field to the model's config.json file, for example:
{
  ...,
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}
  1. Use an additional parameter --max-model-len to specify the desired maximum context length when starting the vLLM service.

For detailed guidance, please refer to the vLLM instructions.

Limitations & Future Plans

While Ling-1T has made strong progress in efficient reasoning, cross-domain generalization, and training efficiency, several limitations remain:

  • GQA-based attention: stable for long-context reasoning but relatively costly. Future versions will adopt hybrid attention to improve efficiency.
  • Limited agentic ability: current model has room to grow in multi-turn interaction, long-term memory, and tool use.
  • Instruction and identity issues: occasional deviations or role confusion may occur; future updates will enhance alignment and consistency.

The future versions of Ling-1T will continue to evolve in architecture, reasoning, and alignment, advancing the series toward more general intelligence.

License

This code repository is licensed under the MIT License.

Downloads last month
1,636
Safetensors
Model size
1000B params
Tensor type
BF16
Β·
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 45 Ask for provider support

Model tree for inclusionAI/Ling-1T

Quantizations
2 models

Spaces using inclusionAI/Ling-1T 2

Collection including inclusionAI/Ling-1T