inclusionAI
/

Ling-1T

@@ -23,11 +23,11 @@ This curriculum greatly enhances the model’s efficiency and reasoning depth, a
 ### Flagship-Level Efficient Reasoning
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/FRNXSJFZGXkAAAAAT-AAAAgADkV7AQFr/original"/>
 <p>
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/3in4SJr8YPkAAAAAUNAAAAgADkV7AQFr/original"/>
 <p>
 We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
@@ -73,7 +73,7 @@ Key architectural innovations include:
 * **QK Normalization** for fully stable convergence
 <p align="center">
-    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/03WMQZIYxpUAAAAAVTAAAAgADkV7AQFr/original"/>
 <p>
 Ling-1T is the **largest FP8-trained foundation model** known to date.
@@ -111,51 +111,9 @@ Empirically, LPO offers superior **training stability** and **generalization** a
 Ling-1T has been extensively evaluated across **knowledge**, **code**, **math**, **reasoning**, **agent**, and **alignment** benchmarks.
 It currently stands as the **best open-source flagship non-thinking model**, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.
-## Evaluation
-| Task                  | Benchmark                  | DeepSeek-V3.1-Terminus                   | Kimi-K2-Instruct-0905                    | gpt-5-main | Gemini 2.5 Pro                           | Ling-1T                                  |
-| --------------------- | -------------------------- | ---------------------------------------- | ---------------------------------------- | ---------- | ---------------------------------------- | ---------------------------------------- |
-|                       |                            | (NonThinking)                            |                                          |            | (thinkBudget=128)                        |                                          |
-| **Knowledge**         | **Professional Knowledge** |                                          |                                          |            |                                          |                                          |
-|                       | C-Eval                     | __91.76__                                | 91.12                                    | 83.59      | 88.77                                    | __<span style="color:red">92.19</span>__ |
-|                       | MMLU-Redux (EM)            | 92.37                                    | 91.58                                    | **92.75**  | __<span style="color:red">94.67</span>__ | 92.25                                    |
-|                       | MMLU-Pro                   | __<span style="color:red">83.25</span>__ | 81.03                                    | 81.94      | **82.13**                                | 82.04                                    |
-| **Knowledge**         | **STEM**                   |                                          |                                          |            |                                          |                                          |
-|                       | MMLU-Pro-Stem              | 87.91                                    | 85.30                                    | 73.45      | __<span style="color:red">88.60</span>__ | **88.5**                                 |
-|                       | OlympiadBench-stem         | 87.83                                    | 79.13                                    | 78.26      | **89.57**                                | __<span style="color:red">91.3</span>__  |
-|                       | GPQA-Diamond               | __<span style="color:red">76.23</span>__ | **73.93**                                | 71.31      | 71.81                                    | 72.98                                    |
-| **Coding**            | **Code Generation**        |                                          |                                          |            |                                          |                                          |
-|                       | MultiPL-E                  | **77.68**                                | 73.76                                    | 76.66      | 71.48                                    | __<span style="color:red">77.91</span>__ |
-|                       | mbpp                       | 90.69                                    | 89.96                                    | **91.72**  | 91.01                                    | __<span style="color:red">96.87</span>__ |
-|                       | LiveCodeBench (2408-2505)  | 48.02                                    | 48.95                                    | **48.57**  | 45.43                                    | __<span style="color:red">61.68</span>__ |
-|                       | CodeForces-rating          | 1582                                     | 1574                                     | 1120       | **1675**                                 | __<span style="color:red">1901</span>__  |
-|                       | BIRD_SQL                   | 44.88                                    | 46.45                                    | 43.97      | __<span style="color:red">54.76</span>__ | **52.38**                                |
-| **Coding**            | **Software Development**   |                                          |                                          |            |                                          |                                          |
-|                       | ArtifactsBench             | 43.29                                    | 44.87                                    | 41.04      | __<span style="color:red">60.28</span>__ | **59.31**                                |
-|                       | FullStack Bench            | **55.48**                                | 54.00                                    | 50.92      | 48.19                                    | __<span style="color:red">56.55</span>__ |
-|                       | Aider                      | **88.16**                                | 85.34                                    | 84.40      | __<span style="color:red">89.85</span>__ | 83.65                                    |
-| **Math**              | **Competition Math**       |                                          |                                          |            |                                          |                                          |
-|                       | CNMO 2024                  | 73.78                                    | 68.92                                    | 63.11      | **74.65**                                | __<span style="color:red">79.25</span>__ |
-|                       | AIME 2025                  | 55.21                                    | 50.16                                    | 59.43      | **70.10**                                | __<span style="color:red">70.42</span>__ |
-|                       | UGMathBench                | **72.70**                                | 69.97                                    | 67.27      | 70.10                                    | __<span style="color:red">74.95</span>__ |
-|                       | Omni-Math                  | 64.77                                    | 62.42                                    | 61.09      | **72.02**                                | __<span style="color:red">74.46</span>__ |
-| **Math**              | **Professional Math**      |                                          |                                          |            |                                          |                                          |
-|                       | FinanceReasoning           | 86.44                                    | 84.83                                    | 86.28      | **86.65**                                | __<span style="color:red">87.45</span>__ |
-|                       | Optibench                  | 64.30                                    | 60.83                                    | 40.06      | **68.76**                                | __<span style="color:red">74.71</span>__ |
-|                       | OptMATH                    | 35.99                                    | 35.84                                    | 39.16      | **42.77**                                | __<span style="color:red">57.68</span>__ |
-| **General Reasoning** |                            |                                          |                                          |            |                                          |                                          |
-|                       | BBEH                       | **42.86**                                | 34.83                                    | 39.75      | 29.08                                    | __<span style="color:red">47.34</span>__ |
-|                       | KOR-Bench                  | **73.76**                                | 73.20                                    | 70.56      | 59.68                                    | __<span style="color:red">76.00</span>__ |
-|                       | ARC-AGI-1                  | 14.69                                    | **22.19**                                | 14.06      | 18.94                                    | __<span style="color:red">43.81</span>__ |
-|                       | ZebraLogic                 | 81.6                                     | **85.5**                                 | 57.3       | 70.2                                     | __<span style="color:red">90.8</span>__  |
-| **Agent**             |                            |                                          |                                          |            |                                          |                                          |
-|                       | BFCL-V3                    | 52.67                                    | __<span style="color:red">71.05</span>__ | 50.27      | 63.31                                    | **69.64**                                |
-| **Alignment**         |                            |                                          |                                          |            |                                          |                                          |
-|                       | Arena Hard V2 ELO          | 54.09                                    | __<span style="color:red">76.95</span>__ | 68.37      | 65.37                                    | **76.26**                                |
-|                       | Arena Hard V2 Win Rate     | 63.24                                    | 69.88                                    | 65.06      | **74.46**                                | __<span style="color:red">75.83</span>__ |
-|                       | writing_bench              | 80.95                                    | **87.59**                                | 77.07      | 80.53                                    | __<span style="color:red">89.4</span>__  |
-|                       | Creative Writing v3        | 85.18                                    | **87.01**                                | 80.93      | 84.99                                    | <span style="color:red">89.24</span>     |
-|                       | MultiChallenge             | 42.49                                    | 48.72                                    | 48.72      | **51.28**                                | __<span style="color:red">58.24</span>__ |
 ## Model Downloads

 ### Flagship-Level Efficient Reasoning
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/YiXwTb4Q_vsAAAAAT-AAAAgADkV7AQFr/original"/>
 <p>
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/MEh7Q5FtzbAAAAAAUQAAAAgADkV7AQFr/original"/>
 <p>
 We comprehensively evaluated Ling-1T against leading flagship models, including both **open-source giants** (e.g., *DeepSeek-V3.1-Terminus*, *Kimi-K2-Instruct-0905*) and **closed-source APIs** (*GPT-5-main*, *Gemini-2.5-Pro*).
 * **QK Normalization** for fully stable convergence
 <p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/naA9TJe7ttIAAAAAVRAAAAgADkV7AQFr/original"/>
 <p>
 Ling-1T is the **largest FP8-trained foundation model** known to date.
 Ling-1T has been extensively evaluated across **knowledge**, **code**, **math**, **reasoning**, **agent**, and **alignment** benchmarks.
 It currently stands as the **best open-source flagship non-thinking model**, rivaling closed-source APIs in complex reasoning while maintaining exceptional efficiency and interpretability.
+<p align="center">
+    <img src="https://mdn.alipayobjects.com/huamei_bcz3yt/afts/img/KrwiQZEDHV0AAAAAWkAAAAgADkV7AQFr/original"/>
+<p>
 ## Model Downloads