Update README.md
Browse files
README.md
CHANGED
|
@@ -40,7 +40,7 @@ Note: If you are interested in the previous version, please visit the past model
|
|
| 40 |
## Deep Thinking & Long-horizon task Execution
|
| 41 |
|
| 42 |
<p align="center">
|
| 43 |
-
<img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/HAkCTKAY7akAAAAAVdAAAAgADod9AQFr/original />
|
| 44 |
</p>
|
| 45 |
|
| 46 |
For evaluating the Deep Thinking and Long-term Execution capabilities of Ring-2.5-1T, we selected representative open-source thinking models (DeepSeek-v3.2-Thinking, Kimi-K2.5-Thinking) and closed-source APIs (GPT-5.2-thinking-high, Gemini-3.0-Pro-preview-thinking-high, Claude-Opus-4.5-Extended-Thinking) as references. Ring-2.5-1T achieves state-of-the-art open-source performance across both high-difficulty reasoning tasks—including mathematics, coding, and logical reasoning (IMOAnswerBench, AIME 26, HMMT 25, LiveCodeBench, ARC-AGI-V2)—and long-horizon task execution such as agent search, tool calling, and software engineering (Gaia2-search, Tau2-bench, and SWE-Bench Verified).
|
|
|
|
| 40 |
## Deep Thinking & Long-horizon task Execution
|
| 41 |
|
| 42 |
<p align="center">
|
| 43 |
+
<img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/HAkCTKAY7akAAAAAVdAAAAgADod9AQFr/original" />
|
| 44 |
</p>
|
| 45 |
|
| 46 |
For evaluating the Deep Thinking and Long-term Execution capabilities of Ring-2.5-1T, we selected representative open-source thinking models (DeepSeek-v3.2-Thinking, Kimi-K2.5-Thinking) and closed-source APIs (GPT-5.2-thinking-high, Gemini-3.0-Pro-preview-thinking-high, Claude-Opus-4.5-Extended-Thinking) as references. Ring-2.5-1T achieves state-of-the-art open-source performance across both high-difficulty reasoning tasks—including mathematics, coding, and logical reasoning (IMOAnswerBench, AIME 26, HMMT 25, LiveCodeBench, ARC-AGI-V2)—and long-horizon task execution such as agent search, tool calling, and software engineering (Gaia2-search, Tau2-bench, and SWE-Bench Verified).
|