zhanghanxiao commited on
Commit
673cf5c
·
verified ·
1 Parent(s): da30481

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +383 -3
README.md CHANGED
@@ -1,3 +1,383 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
+ ---
6
+
7
+ <p align="center">
8
+ <img src="https://mdn.alipayobjects.com/huamei_qa8qxu/afts/img/A*4QxcQrBlTiAAAAAAQXAAAAgAemJ7AQ/original" width="100"/>
9
+ </p>
10
+
11
+ <p align="center">🤗 <a href="https://huggingface.co/inclusionAI">Hugging Face</a>&nbsp;&nbsp; | &nbsp;&nbsp;🤖 <a href="https://modelscope.cn/organization/inclusionAI">ModelScope </a>&nbsp;&nbsp; | &nbsp;&nbsp;🐙 <a href="https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI">Experience Now</a></p>
12
+
13
+
14
+ ## Introduction
15
+
16
+ **Ling-1T** is the first flagship *non-thinking* model in the Ling 2.0 series, featuring **1 trillion total parameters** with **≈ 50 billion active parameters per token**.
17
+ Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of *efficient reasoning* and *scalable cognition*.
18
+
19
+ Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128 K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
20
+ This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.
21
+
22
+
23
+ ### Flagship-Level Efficient Reasoning
24
+
25
+ <p align="center">
26
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/FRNXSJFZGXkAAAAAT-AAAAgADkV7AQFr/original"/>
27
+ <p>
28
+
29
+ <p align="center">
30
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/3in4SJr8YPkAAAAAUNAAAAgADkV7AQFr/original"/>
31
+ <p>
32
+
33
+ We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
34
+ Across code generation, software development, competition-level mathematics, professional math, and logical reasoning, Ling-1T consistently demonstrates **superior complex reasoning ability** and overall advantage.
35
+
36
+ In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
37
+
38
+ <p align="center">
39
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J8ciS5KbIrwAAAAAceAAAAgADkV7AQFr/original"/>
40
+ <p>
41
+
42
+ ### Aesthetic Understanding and Front-End Generation
43
+
44
+ Ling-1T excels in visual reasoning and front-end code generation tasks, combining deep semantic understanding with precise code synthesis.
45
+ We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
46
+ On **ArtifactsBench**, Ling-1T ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.
47
+
48
+
49
+ ### Emergent Intelligence at Trillion-Scale
50
+
51
+ Scaling to the trillion-parameter level has revealed strong **emergent reasoning and transfer capabilities**.
52
+ For example, in the **BFCL V3** tool-use benchmark, Ling-1T achieves **≈ 70 % tool-call accuracy** with only light instruction tuning—despite having seen no large-scale trajectory data during training.
53
+ Ling-1T can:
54
+
55
+ * Interpret complex natural-language instructions
56
+ * Transform abstract logic into functional visual components
57
+ * Generate cross-platform compatible front-end code
58
+ * Create stylistically controlled marketing copy and multi-lingual text
59
+
60
+ These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.
61
+
62
+
63
+ ### Pre-Training at Trillion Scale
64
+
65
+ The Ling 2.0 architecture was designed from the ground up for trillion-scale efficiency, guided by the **Ling Scaling Law** ([arXiv:2507.17702](https://arxiv.org/abs/2507.17702)).
66
+ This ensures architectural and hyperparameter scalability even under **10²⁵–10²⁶ FLOPs** of compute.
67
+
68
+ Key architectural innovations include:
69
+
70
+ * **1 T total / 50 B active parameters** with a **1/32 MoE activation ratio**
71
+ * **MTP layers** for enhanced compositional reasoning
72
+ * **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
73
+ * **QK Normalization** for fully stable convergence
74
+
75
+ <p align="center">
76
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/03WMQZIYxpUAAAAAVTAAAAgADkV7AQFr/original"/>
77
+ <p>
78
+
79
+ Ling-1T is the **largest FP8-trained foundation model** known to date.
80
+ FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1 % loss deviation** from BF16 across **1 T tokens**.
81
+ A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
82
+ System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
83
+
84
+ <p align="center">
85
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/y5UVSKACgLEAAAAAVcAAAAgADkV7AQFr/original"/>
86
+ <p>
87
+
88
+ Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
89
+ Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
90
+ A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
91
+
92
+
93
+ ### Post-Training and Evo-CoT Optimization
94
+
95
+ Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
96
+ This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.
97
+
98
+ For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)** —a novel sentence-level policy optimization method.
99
+ Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
100
+ Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
101
+
102
+ <p align="center">
103
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/kbEWT4BGEQQAAAAAWwAAAAgADkV7AQFr/original"/>
104
+ <p>
105
+ <p align="center">
106
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/aF5LRqK5LMcAAAAAZHAAAAgADkV7AQFr/original"/>
107
+ <p>
108
+
109
+ ## Evaluation
110
+
111
+ Ling-1T has been extensively evaluated across **knowledge**, **code**, **math**, **reasoning**, **agent**, and **alignment** benchmarks.
112
+ It currently stands as the **best open-source flagship non-thinking model**, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.
113
+
114
+ ## Evaluation
115
+ | Task | Benchmark | DeepSeek-V3.1-Terminus | Kimi-K2-Instruct-0905 | gpt-5-main | Gemini 2.5 Pro | Ling-1T |
116
+ | --------------------- | -------------------------- | ---------------------------------------- | ---------------------------------------- | ---------- | ---------------------------------------- | ---------------------------------------- |
117
+ | | | (NonThinking) | | | (thinkBudget=128) | |
118
+ | **Knowledge** | **Professional Knowledge** | | | | | |
119
+ | | C-Eval | __91.76__ | 91.12 | 83.59 | 88.77 | __<span style="color:red">92.19</span>__ |
120
+ | | MMLU-Redux (EM) | 92.37 | 91.58 | **92.75** | __<span style="color:red">94.67</span>__ | 92.25 |
121
+ | | MMLU-Pro | __<span style="color:red">83.25</span>__ | 81.03 | 81.94 | **82.13** | 82.04 |
122
+ | **Knowledge** | **STEM** | | | | | |
123
+ | | MMLU-Pro-Stem | 87.91 | 85.30 | 73.45 | __<span style="color:red">88.60</span>__ | **88.5** |
124
+ | | OlympiadBench-stem | 87.83 | 79.13 | 78.26 | **89.57** | __<span style="color:red">91.3</span>__ |
125
+ | | GPQA-Diamond | __<span style="color:red">76.23</span>__ | **73.93** | 71.31 | 71.81 | 72.98 |
126
+ | **Coding** | **Code Generation** | | | | | |
127
+ | | MultiPL-E | **77.68** | 73.76 | 76.66 | 71.48 | __<span style="color:red">77.91</span>__ |
128
+ | | mbpp | 90.69 | 89.96 | **91.72** | 91.01 | __<span style="color:red">96.87</span>__ |
129
+ | | LiveCodeBench (2408-2505) | 48.02 | 48.95 | **48.57** | 45.43 | __<span style="color:red">61.68</span>__ |
130
+ | | CodeForces-rating | 1582 | 1574 | 1120 | **1675** | __<span style="color:red">1901</span>__ |
131
+ | | BIRD_SQL | 44.88 | 46.45 | 43.97 | __<span style="color:red">54.76</span>__ | **52.38** |
132
+ | **Coding** | **Software Development** | | | | | |
133
+ | | ArtifactsBench | 43.29 | 44.87 | 41.04 | __<span style="color:red">60.28</span>__ | **59.31** |
134
+ | | FullStack Bench | **55.48** | 54.00 | 50.92 | 48.19 | __<span style="color:red">56.55</span>__ |
135
+ | | Aider | **88.16** | 85.34 | 84.40 | __<span style="color:red">89.85</span>__ | 83.65 |
136
+ | **Math** | **Competition Math** | | | | | |
137
+ | | CNMO 2024 | 73.78 | 68.92 | 63.11 | **74.65** | __<span style="color:red">79.25</span>__ |
138
+ | | AIME 2025 | 55.21 | 50.16 | 59.43 | **70.10** | __<span style="color:red">70.42</span>__ |
139
+ | | UGMathBench | **72.70** | 69.97 | 67.27 | 70.10 | __<span style="color:red">74.95</span>__ |
140
+ | | Omni-Math | 64.77 | 62.42 | 61.09 | **72.02** | __<span style="color:red">74.46</span>__ |
141
+ | **Math** | **Professional Math** | | | | | |
142
+ | | FinanceReasoning | 86.44 | 84.83 | 86.28 | **86.65** | __<span style="color:red">87.45</span>__ |
143
+ | | Optibench | 64.30 | 60.83 | 40.06 | **68.76** | __<span style="color:red">74.71</span>__ |
144
+ | | OptMATH | 35.99 | 35.84 | 39.16 | **42.77** | __<span style="color:red">57.68</span>__ |
145
+ | **General Reasoning** | | | | | | |
146
+ | | BBEH | **42.86** | 34.83 | 39.75 | 29.08 | __<span style="color:red">47.34</span>__ |
147
+ | | KOR-Bench | **73.76** | 73.20 | 70.56 | 59.68 | __<span style="color:red">76.00</span>__ |
148
+ | | ARC-AGI-1 | 14.69 | **22.19** | 14.06 | 18.94 | __<span style="color:red">43.81</span>__ |
149
+ | | ZebraLogic | 81.6 | **85.5** | 57.3 | 70.2 | __<span style="color:red">90.8</span>__ |
150
+ | **Agent** | | | | | | |
151
+ | | BFCL-V3 | 52.67 | __<span style="color:red">71.05</span>__ | 50.27 | 63.31 | **69.64** |
152
+ | **Alignment** | | | | | | |
153
+ | | Arena Hard V2 ELO | 54.09 | __<span style="color:red">76.95</span>__ | 68.37 | 65.37 | **76.26** |
154
+ | | Arena Hard V2 Win Rate | 63.24 | 69.88 | 65.06 | **74.46** | __<span style="color:red">75.83</span>__ |
155
+ | | writing_bench | 80.95 | **87.59** | 77.07 | 80.53 | __<span style="color:red">89.4</span>__ |
156
+ | | Creative Writing v3 | 85.18 | **87.01** | 80.93 | 84.99 | <span style="color:red">89.24</span> |
157
+ | | MultiChallenge | 42.49 | 48.72 | 48.72 | **51.28** | __<span style="color:red">58.24</span>__ |
158
+
159
+
160
+
161
+ ## Model Downloads
162
+
163
+ You can download Ling-1T from the following table. If you are located in mainland China, we also provide the model on ModelScope.cn to speed up the download process.
164
+
165
+ <center>
166
+
167
+ | **Model** | **Context Length** | **Download** |
168
+ | :-------: | :----------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: |
169
+ | Ling-1T | 32K -> 128K (YaRN) | [🤗 HuggingFace](https://huggingface.co/inclusionAI/Ling-1T) &nbsp;&nbsp; [🤖 ModelScope](https://www.modelscope.cn/models/inclusionAI/Ling-1T) |
170
+
171
+ </center>
172
+
173
+ Note: If you are interested in previous version, please visit the past model collections in [Huggingface](https://huggingface.co/inclusionAI) or [ModelScope](https://modelscope.cn/organization/inclusionAI).
174
+
175
+
176
+ ## Quickstart
177
+
178
+ ### 🚀 Try Online
179
+
180
+ You can experience Ling-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ling-1t?utm_source=hf_inclusionAI)
181
+
182
+ ### 🔌 API Usage
183
+
184
+ You can also use Ling-1T through API calls:
185
+
186
+ ```python
187
+ from openai import OpenAI
188
+
189
+ # 1. Initialize the OpenAI client
190
+ client = OpenAI(
191
+ # 2. Point the base URL to the ZenMux endpoint
192
+ base_url="https://zenmux.ai/api/v1",
193
+ # 3. Replace with the API Key from your ZenMux user console
194
+ api_key="<your ZENMUX_API_KEY>",
195
+ )
196
+
197
+ # 4. Make a request
198
+ completion = client.chat.completions.create(
199
+ # 5. Specify the model to use in the format "provider/model-name"
200
+ model="inclusionai/ling-1t",
201
+ messages=[
202
+ {
203
+ "role": "user",
204
+ "content": "What is the meaning of life?"
205
+ }
206
+ ]
207
+ )
208
+
209
+ print(completion.choices[0].message.content)
210
+ ```
211
+
212
+ ### 🤗 Hugging Face Transformers
213
+
214
+ Here is a code snippet to show you how to use the chat model with `transformers`:
215
+
216
+ ```python
217
+ from transformers import AutoModelForCausalLM, AutoTokenizer
218
+
219
+ model_name = "inclusionAI/Ling-1T"
220
+
221
+ model = AutoModelForCausalLM.from_pretrained(
222
+ model_name,
223
+ dtype="auto",
224
+ device_map="auto",
225
+ trust_remote_code=True,
226
+ )
227
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
228
+
229
+ prompt = "Give me a short introduction to large language models."
230
+ messages = [
231
+ {"role": "system", "content": "You are Ling, an assistant created by inclusionAI"},
232
+ {"role": "user", "content": prompt}
233
+ ]
234
+ text = tokenizer.apply_chat_template(
235
+ messages,
236
+ tokenize=False,
237
+ add_generation_prompt=True
238
+ )
239
+ model_inputs = tokenizer([text], return_tensors="pt", return_token_type_ids=False).to(model.device)
240
+
241
+ generated_ids = model.generate(
242
+ **model_inputs,
243
+ max_new_tokens=512
244
+ )
245
+ generated_ids = [
246
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
247
+ ]
248
+
249
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
250
+ ```
251
+
252
+ ### 🤖 ModelScope
253
+
254
+ If you're in mainland China, we strongly recommend you to use our model from 🤖 <a href="https://modelscope.cn/models/inclusionAI/Ling-1T">ModelScope</a>.
255
+
256
+ ## Deployment
257
+
258
+ ### vLLM
259
+
260
+ vLLM supports offline batched inference or launching an OpenAI-Compatible API Service for online inference.
261
+
262
+ #### Environment Preparation
263
+
264
+ ```bash
265
+ pip install vllm==0.11.0
266
+ ```
267
+
268
+ #### Offline Inference:
269
+
270
+ ```python
271
+ from transformers import AutoTokenizer
272
+ from vllm import LLM, SamplingParams
273
+
274
+ tokenizer = AutoTokenizer.from_pretrained("inclusionAI/Ling-1T")
275
+
276
+ sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=16384)
277
+
278
+ llm = LLM(model="inclusionAI/Ling-1T", dtype='bfloat16', trust_remote_code=True)
279
+ prompt = "Give me a short introduction to large language models."
280
+ messages = [
281
+ {"role": "system", "content": "You are Ling, an assistant created by inclusionAI"},
282
+ {"role": "user", "content": prompt}
283
+ ]
284
+
285
+ text = tokenizer.apply_chat_template(
286
+ messages,
287
+ tokenize=False,
288
+ add_generation_prompt=True
289
+ )
290
+ outputs = llm.generate([text], sampling_params)
291
+
292
+ ```
293
+
294
+ #### Online Inference:
295
+
296
+ ```bash
297
+ vllm serve inclusionAI/Ling-1T \
298
+ --tensor-parallel-size 32 \
299
+ --pipeline-parallel-size 1 \
300
+ --trust-remote-code \
301
+ --gpu-memory-utilization 0.90
302
+
303
+ # This is only an example, please adjust arguments according to your actual environment.
304
+ ```
305
+
306
+ To handle long context in vLLM using YaRN, we need to follow these two steps:
307
+ 1. Add a `rope_scaling` field to the model's `config.json` file, for example:
308
+ ```json
309
+ {
310
+ ...,
311
+ "rope_scaling": {
312
+ "factor": 4.0,
313
+ "original_max_position_embeddings": 32768,
314
+ "type": "yarn"
315
+ }
316
+ }
317
+ ```
318
+ 2. Use an additional parameter `--max-model-len` to specify the desired maximum context length when starting the vLLM service.
319
+
320
+ For detailed guidance, please refer to the vLLM [`instructions`](https://docs.vllm.ai/en/latest/).
321
+
322
+
323
+ ### SGLang
324
+
325
+ #### Environment Preparation
326
+
327
+ We will later submit our model to SGLang official release, now we can prepare the environment following steps:
328
+ ```shell
329
+ pip3 install sglang==0.5.2rc0 sgl-kernel==0.3.7.post1
330
+ ```
331
+ You can use docker image as well:
332
+ ```shell
333
+ docker pull lmsysorg/sglang:v0.5.2rc0-cu126
334
+ ```
335
+ Then you should apply patch to sglang installation:
336
+ ```bash
337
+ # patch command is needed, run `yum install -y patch` if needed
338
+ patch -d `python -c 'import sglang;import os; print(os.path.dirname(sglang.__file__))'` -p3 < inference/sglang/bailing_moe_v2.patch
339
+ ```
340
+
341
+ #### Run Inference
342
+
343
+ BF16 and FP8 models are supported by SGLang now, it depends on the dtype of the model in ${MODEL_PATH}. They both share the same command in the following:
344
+
345
+ - Start server:
346
+ ```bash
347
+ python -m sglang.launch_server \
348
+ --model-path $MODEL_PATH \
349
+ --host 0.0.0.0 --port $PORT \
350
+ --trust-remote-code \
351
+ --attention-backend fa3
352
+
353
+ # This is only an example, please adjust arguments according to your actual environment.
354
+ ```
355
+ MTP is supported for base model, and not yet for chat model. You can add parameter `--speculative-algorithm NEXTN`
356
+ to start command.
357
+
358
+ - Client:
359
+
360
+ ```shell
361
+ curl -s http://localhost:${PORT}/v1/chat/completions \
362
+ -H "Content-Type: application/json" \
363
+ -d '{"model": "auto", "messages": [{"role": "user", "content": "What is the capital of France?"}]}'
364
+ ```
365
+
366
+ More usage can be found [here](https://docs.sglang.ai/basic_usage/send_request.html)
367
+
368
+
369
+
370
+ ## Limitations & Future Plans
371
+
372
+ While **Ling-1T** has made strong progress in efficient reasoning, cross-domain generalization, and training efficiency, several limitations remain:
373
+
374
+ * **GQA-based attention**: stable for long-context reasoning but relatively costly. Future versions will adopt **hybrid attention** to improve efficiency.
375
+ * **Limited agentic ability**: current model has room to grow in multi-turn interaction, long-term memory, and tool use.
376
+ * **Instruction and identity issues**: occasional deviations or role confusion may occur; future updates will enhance **alignment and consistency**.
377
+
378
+ The future versions of Ling-1T will continue to evolve in architecture, reasoning, and alignment, advancing the series toward more general intelligence.
379
+
380
+
381
+ ## License
382
+
383
+ This code repository is licensed under [the MIT License](https://github.com/inclusionAI/Ling-V2/blob/main/LICENSE).