LiangJiang commited on
Commit
b2c63d0
·
verified ·
1 Parent(s): 283a808

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -34,7 +34,7 @@ Note: If you are interested in previous version, please visit the past model col
34
 
35
  ## Continuously Evolving Deep Reasoning Capabilities
36
 
37
- To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source reasoning models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as math competitions (AIME 25, HMMT 25), code generation (LiveCodeBench, CodeForce), and logical reasoning (ARC-AGI-1). It also exhibits strong competitiveness in comprehensive tasks (Arena-Hard-v2.0), healthcare (HealthBench), and creative writing (Creative Writing v3).
38
 
39
  <p align="center">
40
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
@@ -42,7 +42,7 @@ To evaluate the deep reasoning capabilities of Ring-1T, we selected representati
42
 
43
  Although we have implemented string-level and semantic-level contamination filtering for benchmark tasks across all training stages—including pre-training, fine-tuning instructions, and reinforcement learning prompts—rigorous decontamination for earlier published benchmarks remains a significant challenge in the industry. To more objectively analyze Ring-1T's deep reasoning capabilities, we conducted tests using the IMO 2025 (International Mathematical Olympiad) held in July this year and the recently concluded ICPC World Finals 2025 (International Collegiate Programming Contest World Finals).
44
 
45
- For the IMO 2025 test, similar to the previous preview version, we integrated Ring-1T into the multi-agent framework AWorld (https://github.com/inclusionAI/AWorld) and used pure natural language reasoning to solve the problems. The results show that Ring-1T solved Problems 1, 3, 4, and 5 in a single attempt (silver medal level at IMO). On the third attempt, it also produced a nearly perfect proof for Problem 2, a geometry proof. For the most challenging Problem 6 (which no AI contestant in IMO 2025 solved correctly), Ring-1T converged to the same answer as Gemini 2.5 Pro—"4048" (the correct answer is 2112). We believe that with ongoing optimizations, Ring-1T has the potential to reach gold medal level at IMO in a single attempt in the future.
46
 
47
  <p align="center">
48
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/mnRJTa5a00gAAAAAQ2AAAAgADod9AQFr/original" width="500"/>
@@ -110,7 +110,6 @@ Ring-1T@Aworld IMO test trajectory: [https://github.com/inclusionAI/AWorld/tree/
110
 
111
  ### 🚀 Try Online
112
 
113
- **TODO**
114
  You can experience Ring-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI)
115
 
116
  ### 🔌 API Usage
 
34
 
35
  ## Continuously Evolving Deep Reasoning Capabilities
36
 
37
+ To evaluate the deep reasoning capabilities of Ring-1T, we selected representative open-source reasoning models (Ring-1T-preview, Deepseek-V3.1-Terminus-Thinking, Qwen-235B-A22B-Thinking-2507) and closed-source APIs (Gemini-2.5-pro and GPT-5-Thinking(High)) as benchmarks. First, compared to the previously open-sourced preview version, Ring-1T demonstrates more balanced performance across various tasks. Furthermore, Ring-1T achieves open-source leading performance on challenging reasoning benchmarks such as **math competitions** (AIME 25, HMMT 25), **code generation** (LiveCodeBench, CodeForce), and **logical reasoning** (ARC-AGI-1). It also exhibits strong competitiveness in **comprehensive tasks** (Arena-Hard-v2.0), **healthcare** (HealthBench), and **creative writing** (Creative Writing v3).
38
 
39
  <p align="center">
40
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/5TBESJNjsbAAAAAAYYAAAAgADod9AQFr/original" />
 
42
 
43
  Although we have implemented string-level and semantic-level contamination filtering for benchmark tasks across all training stages—including pre-training, fine-tuning instructions, and reinforcement learning prompts—rigorous decontamination for earlier published benchmarks remains a significant challenge in the industry. To more objectively analyze Ring-1T's deep reasoning capabilities, we conducted tests using the IMO 2025 (International Mathematical Olympiad) held in July this year and the recently concluded ICPC World Finals 2025 (International Collegiate Programming Contest World Finals).
44
 
45
+ For the **IMO 2025** test, similar to the previous preview version, we integrated Ring-1T into the multi-agent framework AWorld (https://github.com/inclusionAI/AWorld) and used pure natural language reasoning to solve the problems. The results show that Ring-1T solved Problems 1, 3, 4, and 5 in a single attempt (silver medal level at IMO). On the third attempt, it also produced a nearly perfect proof for Problem 2, a geometry proof. For the most challenging Problem 6 (which no AI contestant in IMO 2025 solved correctly), Ring-1T converged to the same answer as Gemini 2.5 Pro—"4048" (the correct answer is 2112). We believe that with ongoing optimizations, Ring-1T has the potential to reach gold medal level at IMO in a single attempt in the future.
46
 
47
  <p align="center">
48
  <img src="https://mdn.alipayobjects.com/huamei_d2byvp/afts/img/mnRJTa5a00gAAAAAQ2AAAAgADod9AQFr/original" width="500"/>
 
110
 
111
  ### 🚀 Try Online
112
 
 
113
  You can experience Ring-1T online at: [ZenMux](https://zenmux.ai/inclusionai/ring-1t?utm_source=hf_inclusionAI)
114
 
115
  ### 🔌 API Usage