LiangJiang commited on
Commit
b3087bf
·
verified ·
1 Parent(s): 7a791ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -40,7 +40,7 @@ Note: If you are interested in the previous version, please visit the past model
40
  ## Deep Thinking & Long-horizon task Execution
41
 
42
  <p align="center">
43
- <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/HAkCTKAY7akAAAAAVdAAAAgADod9AQFr/original />
44
  </p>
45
 
46
  For evaluating the Deep Thinking and Long-term Execution capabilities of Ring-2.5-1T, we selected representative open-source thinking models (DeepSeek-v3.2-Thinking, Kimi-K2.5-Thinking) and closed-source APIs (GPT-5.2-thinking-high, Gemini-3.0-Pro-preview-thinking-high, Claude-Opus-4.5-Extended-Thinking) as references. Ring-2.5-1T achieves state-of-the-art open-source performance across both high-difficulty reasoning tasks—including mathematics, coding, and logical reasoning (IMOAnswerBench, AIME 26, HMMT 25, LiveCodeBench, ARC-AGI-V2)—and long-horizon task execution such as agent search, tool calling, and software engineering (Gaia2-search, Tau2-bench, and SWE-Bench Verified).
 
40
  ## Deep Thinking & Long-horizon task Execution
41
 
42
  <p align="center">
43
+ <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/HAkCTKAY7akAAAAAVdAAAAgADod9AQFr/original" />
44
  </p>
45
 
46
  For evaluating the Deep Thinking and Long-term Execution capabilities of Ring-2.5-1T, we selected representative open-source thinking models (DeepSeek-v3.2-Thinking, Kimi-K2.5-Thinking) and closed-source APIs (GPT-5.2-thinking-high, Gemini-3.0-Pro-preview-thinking-high, Claude-Opus-4.5-Extended-Thinking) as references. Ring-2.5-1T achieves state-of-the-art open-source performance across both high-difficulty reasoning tasks—including mathematics, coding, and logical reasoning (IMOAnswerBench, AIME 26, HMMT 25, LiveCodeBench, ARC-AGI-V2)—and long-horizon task execution such as agent search, tool calling, and software engineering (Gaia2-search, Tau2-bench, and SWE-Bench Verified).