inclusionAI
/

Ling-1T

@@ -23,11 +23,11 @@ This curriculum greatly enhances the model’s efficiency and reasoning depth, a
 ### Flagship-Level Efficient Reasoning
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/X7mZSJQX_fsAAAAAT_AAAAgADkV7AQFr/original"/>
 <p>
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/DZ1kSKT57J0AAAAAUOAAAAgADkV7AQFr/original"/>
 <p>
 We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
@@ -36,7 +36,7 @@ Across code generation, software development, competition-level mathematics, pro
 In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/CNhVT4sGM0kAAAAAciAAAAgADkV7AQFr/original"/>
 <p>
 ### Aesthetic Understanding and Front-End Generation
@@ -72,13 +72,17 @@ Key architectural innovations include:
 * **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
 * **QK Normalization** for fully stable convergence
 Ling-1T is the **largest FP8-trained foundation model** known to date.
 FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1 % loss deviation** from BF16 across **1 T tokens**.
 A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
 System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/StIxTrsy-_MAAAAAVTAAAAgADkV7AQFr/original"/>
 <p>
 Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
@@ -96,10 +100,10 @@ Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sente
 Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/o10CRK8P8hwAAAAAWwAAAAgADkV7AQFr/original"/>
 <p>
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J7I6QZqI-6AAAAAAZHAAAAgADkV7AQFr/original"/>
 <p>
 ## Evaluation

 ### Flagship-Level Efficient Reasoning
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/FRNXSJFZGXkAAAAAT-AAAAgADkV7AQFr/original"/>
 <p>
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/3in4SJr8YPkAAAAAUNAAAAgADkV7AQFr/original"/>
 <p>
 We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
 In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/J8ciS5KbIrwAAAAAceAAAAgADkV7AQFr/original"/>
 <p>
 ### Aesthetic Understanding and Front-End Generation
 * **Aux-loss-free**, **sigmoid-scoring expert routing** with **zero-mean updates**
 * **QK Normalization** for fully stable convergence
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/03WMQZIYxpUAAAAAVTAAAAgADkV7AQFr/original"/>
+<p>
 Ling-1T is the **largest FP8-trained foundation model** known to date.
 FP8 mixed-precision training yields **15 %+ end-to-end speedup**, improved memory efficiency, and maintains **≤ 0.1 % loss deviation** from BF16 across **1 T tokens**.
 A fine-grained, **heterogeneous 1F1B interleaved pipeline** further boosts utilization by 40 %+.
 System-level optimizations—fused kernels, communication scheduling, recomputation, checkpointing, simulation, and telemetry—ensure stable trillion-scale training.
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/y5UVSKACgLEAAAAAVcAAAAgADkV7AQFr/original"/>
 <p>
 Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-dense data** in later stages.
 Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/kbEWT4BGEQQAAAAAWwAAAAgADkV7AQFr/original"/>
 <p>
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/aF5LRqK5LMcAAAAAZHAAAAgADkV7AQFr/original"/>
 <p>
 ## Evaluation