inclusionAI
/

Ling-1T-FP8

+---
+license: mit
+pipeline_tag: text-generation
+library_name: transformers
+---
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
+</p>
+<p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 <a href="https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI">Experience Now</a></p>
+## Introduction
+**Ling-1T** is the first flagship *non-thinking* model in the Ling 2.0 series, featuring **1 trillion total parameters** with **≈ 50 billion active parameters per token**.
+Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of *efficient reasoning* and *scalable cognition*.
+Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128 K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
+This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.
+### Flagship-Level Efficient Reasoning
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/FRNXSJFZGXkAAAAAT-AAAAgADkV7AQFr/original"/>
+<p>
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/3in4SJr8YPkAAAAAUNAAAAgADkV7AQFr/original"/>
+<p>
+We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
+Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.
+In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J8ciS5KbIrwAAAAAceAAAAgADkV7AQFr/original"/>
+<p>
+### Aesthetic Understanding and Front-End Generation
+Ling-1T excels in visual reasoning and front-end code generation tasks, combining deep semantic understanding with precise code synthesis.
+We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
+On **ArtifactsBench**, Ling-1T ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.
+### Emergent Intelligence at Trillion-Scale
+Scaling to the trillion-parameter level has revealed strong **emergent reasoning and transfer capabilities**.
+For example, in the **BFCL V3** tool-use benchmark, Ling-1T achieves **≈ 70 % tool-call accuracy** with only light instruction tuning—despite having seen no large-scale trajectory data during training.
+Ling-1T can:
+* Interpret complex natural-language instructions
+* Transform abstract logic into functional visual components
+* Generate cross-platform compatible front-end code
+* Create stylistically controlled marketing copy and multi-lingual text
+These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.
+### Pre-Training at Trillion Scale
+The Ling 2.0 architecture was designed from the ground up for trillion-scale efficiency, guided by the **Ling Scaling Law** ([arXiv:2507.17702](https://arxiv.org/abs/2507.17702)).
+This ensures architectural and hyperparameter scalability even under **10²⁵–10²⁶ FLOPs** of compute.
+Key architectural innovations include:
+* **1 T total / 50 B active parameters** with a **1/32 MoE activation ratio**
+* **MTP layers** for enhanced compositional reasoning
+* **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
+* **QK Normalization** for fully stable convergence
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/03WMQZIYxpUAAAAAVTAAAAgADkV7AQFr/original"/>
+<p>
+Ling-1T is the **largest FP8-trained foundation model** known to date.
+FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1 % loss deviation** from BF16 across **1 T tokens**.
+A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
+System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/y5UVSKACgLEAAAAAVcAAAAgADkV7AQFr/original"/>
+<p>
+Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
+Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
+A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
+### Post-Training and Evo-CoT Optimization
+Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
+This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.
+For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)** —a novel sentence-level policy optimization method.
+Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
+Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/kbEWT4BGEQQAAAAAWwAAAAgADkV7AQFr/original"/>
+<p>
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/aF5LRqK5LMcAAAAAZHAAAAgADkV7AQFr/original"/>
+<p>
+## Evaluation
+Ling-1T has been extensively evaluated across **knowledge**, **code**, **math**, **reasoning**, **agent**, and **alignment** benchmarks.
+It currently stands as the **best open-source flagship non-thinking model**, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.
+## Evaluation
+| Task                  | Benchmark                  | DeepSeek-V3.1-Terminus                   | Kimi-K2-Instruct-0905                    | gpt-5-main | Gemini 2.5 Pro                           | Ling-1T                                  |
+| --------------------- | -------------------------- | ---------------------------------------- | ---------------------------------------- | ---------- | ---------------------------------------- | ---------------------------------------- |
+|                       |                            | (NonThinking)                            |                                          |            | (thinkBudget=128)                        |                                          |
+| **Knowledge**         | **Professional Knowledge** |                                          |                                          |            |                                          |                                          |
+|                       | C-Eval                     | __91.76__                                | 91.12                                    | 83.59      | 88.77                                    | __<span style="color:red">92.19</span>__ |
+|                       | MMLU-Redux (EM)            | 92.37                                    | 91.58                                    | **92.75**  | __<span style="color:red">94.67</span>__ | 92.25                                    |
+|                       | MMLU-Pro                   | __<span style="color:red">83.25</span>__ | 81.03                                    | 81.94      | **82.13**                                | 82.04                                    |
+| **Knowledge**         | **STEM**                   |                                          |                                          |            |                                          |                                          |
+|                       | MMLU-Pro-Stem              | 87.91                                    | 85.30                                    | 73.45      | __<span style="color:red">88.60</span>__ | **88.5**                                 |
+|                       | OlympiadBench-stem         | 87.83                                    | 79.13                                    | 78.26      | **89.57**                                | __<span style="color:red">91.3</span>__  |
+|                       | GPQA-Diamond               | __<span style="color:red">76.23</span>__ | **73.93**                                | 71.31      | 71.81                                    | 72.98                                    |
+| **Coding**            | **Code Generation**        |                                          |                                          |            |                                          |                                          |
+|                       | MultiPL-E                  | **77.68**                                | 73.76                                    | 76.66      | 71.48                                    | __<span style="color:red">77.91</span>__ |
+|                       | mbpp                       | 90.69                                    | 89.96                                    | **91.72**  | 91.01                                    | __<span style="color:red">96.87</span>__ |
+|                       | LiveCodeBench (2408-2505)  | 48.02                                    | 48.95                                    | **48.57**  | 45.43                                    | __<span style="color:red">61.68</span>__ |
+|                       | CodeForces-rating          | 1582                                     | 1574                                     | 1120       | **1675**                                 | __<span style="color:red">1901</span>__  |
+|                       | BIRD_SQL                   | 44.88                                    | 46.45                                    | 43.97      | __<span style="color:red">54.76</span>__ | **52.38**                                |
+| **Coding**            | **Software Development**   |                                          |                                          |            |                                          |                                          |
+|                       | ArtifactsBench             | 43.29                                    | 44.87                                    | 41.04      | __<span style="color:red">60.28</span>__ | **59.31**                                |
+|                       | FullStack Bench            | **55.48**                                | 54.00                                    | 50.92      | 48.19                                    | __<span style="color:red">56.55</span>__ |
+|                       | Aider                      | **88.16**                                | 85.34                                    | 84.40      | __<span style="color:red">89.85</span>__ | 83.65                                    |
+| **Math**              | **Competition Math**       |                                          |                                          |            |                                          |                                          |
+|                       | CNMO 2024                  | 73.78                                    | 68.92                                    | 63.11      | **74.65**                                | __<span style="color:red">79.25</span>__ |
+|                       | AIME 2025                  | 55.21                                    | 50.16                                    | 59.43      | **70.10**                                | __<span style="color:red">70.42</span>__ |
+|                       | UGMathBench                | **72.70**                                | 69.97                                    | 67.27      | 70.10                                    | __<span style="color:red">74.95</span>__ |
+|                       | Omni-Math                  | 64.77                                    | 62.42                                    | 61.09      | **72.02**                                | __<span style="color:red">74.46</span>__ |
+| **Math**              | **Professional Math**      |                                          |                                          |            |                                          |                                          |
+|                       | FinanceReasoning           | 86.44                                    | 84.83                                    | 86.28      | **86.65**                                | __<span style="color:red">87.45</span>__ |
+|                       | Optibench                  | 64.30                                    | 60.83                                    | 40.06      | **68.76**                                | __<span style="color:red">74.71</span>__ |
+|                       | OptMATH                    | 35.99                                    | 35.84                                    | 39.16      | **42.77**                                | __<span style="color:red">57.68</span>__ |
+| **General Reasoning** |                            |                                          |                                          |            |                                          |                                          |
+|                       | BBEH                       | **42.86**                                | 34.83                                    | 39.75      | 29.08                                    | __<span style="color:red">47.34</span>__ |
+|                       | KOR-Bench                  | **73.76**                                | 73.20                                    | 70.56      | 59.68                                    | __<span style="color:red">76.00</span>__ |
+|                       | ARC-AGI-1                  | 14.69                                    | **22.19**                                | 14.06      | 18.94                                    | __<span style="color:red">43.81</span>__ |
+|                       | ZebraLogic                 | 81.6                                     | **85.5**                                 | 57.3       | 70.2                                     | __<span style="color:red">90.8</span>__  |
+| **Agent**             |                            |                                          |                                          |            |                                          |                                          |
+|                       | BFCL-V3                    | 52.67                                    | __<span style="color:red">71.05</span>__ | 50.27      | 63.31                                    | **69.64**                                |
+| **Alignment**         |                            |                                          |                                          |            |                                          |                                          |
+|                       | Arena Hard V2 ELO          | 54.09                                    | __<span style="color:red">76.95</span>__ | 68.37      | 65.37                                    | **76.26**                                |
+|                       | Arena Hard V2 Win Rate     | 63.24                                    | 69.88                                    | 65.06      | **74.46**                                | __<span style="color:red">75.83</span>__ |
+|                       | writing_bench              | 80.95                                    | **87.59**                                | 77.07      | 80.53                                    | __<span style="color:red">89.4</span>__  |
+|                       | Creative Writing v3        | 85.18                                    | **87.01**                                | 80.93      | 84.99                                    | <span style="color:red">89.24</span>     |
+|                       | MultiChallenge             | 42.49                                    | 48.72                                    | 48.72      | **51.28**                                | __<span style="color:red">58.24</span>__ |
+## Model Downloads
+You can download Ling-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.
+<center>
+| **Model** | **Context Length** |                                                                 **Download**                                                                  |
+| :-------: | :----------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: |
+|  Ling-1T  | 32K -> 128K (YaRN) | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-1T) &nbsp;&nbsp; [🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-1T) |
+</center>
+Note: If you are interested in previous version, please visit the past model collections in [Huggingface](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI).
+## Quickstart
+### 🚀 Try Online
+You can experience Ling-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI)
+### 🔌 API Usage
+You can also use Ling-1T through API calls:
+```python
+from openai import OpenAI
+# 1. Initialize the OpenAI client
+client = OpenAI(
+    # 2. Point the base URL to the ZenMux endpoint
+    base_url="https://zenmux.ai/api/v1",
+    # 3. Replace with the API Key from your ZenMux user console
+    api_key="<your ZENMUX_API_KEY>",
+)
+# 4. Make a request
+completion = client.chat.completions.create(
+    # 5. Specify the model to use in the format "provider/model-name"
+    model="inclusionai/ling-1t",
+    messages=[
+        {
+            "role": "user",
+            "content": "What is the meaning of life?"
+        }
+    ]
+)
+print(completion.choices[0].message.content)
+```
+### 🤗 Hugging Face Transformers
+Here is a code snippet to show you how to use the chat model with `transformers`:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "inclusionAI/Ling-1T"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    dtype="auto",
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "Give me a short introduction to large language models."
+messages = [
+    {"role": "system", "content": "You are Ling, an assistant created by inclusionAI"},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt", return_token_type_ids=False).to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+### 🤖 ModelScope
+If you're in mainland China, we strongly recommend you to use our model from 🤖 <a href="https://modelscope.cn/models/inclusionAI/Ling-1T">ModelScope</a>.
+## Deployment
+### vLLM
+vLLM supports offline batched inference or launching an OpenAI-Compatible API Service for online inference.
+#### Environment Preparation
+```bash
+pip install vllm==0.11.0
+```
+#### Offline Inference:
+```python
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ling-1T")
+sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=16384)
+llm = LLM(model="inclusionAI/Ling-1T", dtype='bfloat16', trust_remote_code=True)
+prompt = "Give me a short introduction to large language models."
+messages = [
+    {"role": "system", "content": "You are Ling, an assistant created by inclusionAI"},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+outputs = llm.generate([text], sampling_params)
+```
+#### Online Inference:
+```bash
+vllm serve inclusionAI/Ling-1T \
+              --tensor-parallel-size 32 \
+              --pipeline-parallel-size 1 \
+              --trust-remote-code \
+              --gpu-memory-utilization 0.90
+# This is only an example, please adjust arguments according to your actual environment.
+```
+To handle long context in vLLM using YaRN, we need to follow these two steps:
+1. Add a `rope_scaling` field to the model's `config.json` file, for example:
+```json
+{
+  ...,
+  "rope_scaling": {
+    "factor": 4.0,
+    "original_max_position_embeddings": 32768,
+    "type": "yarn"
+  }
+}
+```
+2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.
+For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
+### SGLang
+#### Environment Preparation
+We will later submit our model to SGLang official release, now we can prepare the environment following steps:
+```shell
+pip3 install sglang==0.5.2rc0 sgl-kernel==0.3.7.post1
+```
+You can use docker image as well:
+```shell
+docker pull lmsysorg/sglang:v0.5.2rc0-cu126
+```
+Then you should apply patch to sglang installation:
+```bash
+# patch command is needed, run `yum install -y patch` if needed
+patch -d `python -c 'import sglang;import os; print(os.path.dirname(sglang.__file__))'` -p3 < inference/sglang/bailing_moe_v2.patch
+```
+#### Run Inference
+BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the model in ${MODEL_PATH}. They both share the same command in the following:
+- Start server:
+```bash
+python -m sglang.launch_server \
+    --model-path $MODEL_PATH \
+    --host 0.0.0.0 --port $PORT \
+    --trust-remote-code \
+    --attention-backend fa3
+# This is only an example, please adjust arguments according to your actual environment.
+```
+MTP is supported for base model, and not yet for chat model. You can add parameter `--speculative-algorithm NEXTN`
+to start command.
+- Client:
+```shell
+curl -s http://localhost:${PORT}/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
+```
+More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)
+## Limitations & Future Plans
+While **Ling-1T** has made strong progress in efficient reasoning, cross-domain generalization, and training efficiency, several limitations remain:
+* **GQA-based attention**: stable for long-context reasoning but relatively costly. Future versions will adopt **hybrid attention** to improve efficiency.
+* **Limited agentic ability**: current model has room to grow in multi-turn interaction, long-term memory, and tool use.
+* **Instruction and identity issues**: occasional deviations or role confusion may occur; future updates will enhance **alignment and consistency**.
+The future versions of Ling-1T will continue to evolve in architecture, reasoning, and alignment, advancing the series toward more general intelligence.
+## License
+This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE).