Text Generation
Transformers
Safetensors
bailing_moe
conversational
custom_code
zhanghanxiao commited on
Commit
bc6505f
·
verified ·
1 Parent(s): 60f0774

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -6
README.md CHANGED
@@ -23,11 +23,11 @@ This curriculum greatly enhances the model’s efficiency and reasoning depth, a
23
  ### Flagship-Level Efficient Reasoning
24
 
25
  <p align="center">
26
- <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/X7mZSJQX_fsAAAAAT_AAAAgADkV7AQFr/original"/>
27
  <p>
28
 
29
  <p align="center">
30
- <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/DZ1kSKT57J0AAAAAUOAAAAgADkV7AQFr/original"/>
31
  <p>
32
 
33
  We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
@@ -36,7 +36,7 @@ Across code generation, software development, competition-level mathematics, pro
36
  In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
37
 
38
  <p align="center">
39
- <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/CNhVT4sGM0kAAAAAciAAAAgADkV7AQFr/original"/>
40
  <p>
41
 
42
  ### Aesthetic Understanding and Front-End Generation
@@ -72,13 +72,17 @@ Key architectural innovations include:
72
  * **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
73
  * **QK Normalization** for fully stable convergence
74
 
 
 
 
 
75
  Ling-1T is the **largest FP8-trained foundation model** known to date.
76
  FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1 % loss deviation** from BF16 across **1 T tokens**.
77
  A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
78
  System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
79
 
80
  <p align="center">
81
- <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/StIxTrsy-_MAAAAAVTAAAAgADkV7AQFr/original"/>
82
  <p>
83
 
84
  Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
@@ -96,10 +100,10 @@ Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sente
96
  Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
97
 
98
  <p align="center">
99
- <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/o10CRK8P8hwAAAAAWwAAAAgADkV7AQFr/original"/>
100
  <p>
101
  <p align="center">
102
- <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J7I6QZqI-6AAAAAAZHAAAAgADkV7AQFr/original"/>
103
  <p>
104
 
105
  ## Evaluation
 
23
  ### Flagship-Level Efficient Reasoning
24
 
25
  <p align="center">
26
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/FRNXSJFZGXkAAAAAT-AAAAgADkV7AQFr/original"/>
27
  <p>
28
 
29
  <p align="center">
30
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/3in4SJr8YPkAAAAAUNAAAAgADkV7AQFr/original"/>
31
  <p>
32
 
33
  We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
 
36
  In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
37
 
38
  <p align="center">
39
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J8ciS5KbIrwAAAAAceAAAAgADkV7AQFr/original"/>
40
  <p>
41
 
42
  ### Aesthetic Understanding and Front-End Generation
 
72
  * **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
73
  * **QK Normalization** for fully stable convergence
74
 
75
+ <p align="center">
76
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/03WMQZIYxpUAAAAAVTAAAAgADkV7AQFr/original"/>
77
+ <p>
78
+
79
  Ling-1T is the **largest FP8-trained foundation model** known to date.
80
  FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1 % loss deviation** from BF16 across **1 T tokens**.
81
  A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
82
  System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
83
 
84
  <p align="center">
85
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/y5UVSKACgLEAAAAAVcAAAAgADkV7AQFr/original"/>
86
  <p>
87
 
88
  Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
 
100
  Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
101
 
102
  <p align="center">
103
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/kbEWT4BGEQQAAAAAWwAAAAgADkV7AQFr/original"/>
104
  <p>
105
  <p align="center">
106
+ <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/aF5LRqK5LMcAAAAAZHAAAAgADkV7AQFr/original"/>
107
  <p>
108
 
109
  ## Evaluation