PCL-Reasoner commited on
Commit
6f9971b
·
verified ·
1 Parent(s): c0c37be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -32
README.md CHANGED
@@ -35,8 +35,8 @@ The table below compares mainstream models on the AIME24 and AIME25 benchmarks.
35
  </tr>
36
  <tr>
37
  <td>DeepSeek-R1-0528</td>
38
- <td><span style="color:red">91.4</span></td>
39
- <td><span style="color:red">87.5</span></td>
40
  </tr>
41
  <tr>
42
  <td>Qwen3-235B-A22B</td>
@@ -50,7 +50,7 @@ The table below compares mainstream models on the AIME24 and AIME25 benchmarks.
50
  </tr>
51
  <tr>
52
  <td>Gemini-2.5-Pro-0506</td>
53
- <td><span style="color:red">90.8</span></td>
54
  <td><span style="color:grey">83</span></td>
55
  </tr>
56
  <!-- 合并行表头 32B -->
@@ -89,32 +89,3 @@ The table below compares mainstream models on the AIME24 and AIME25 benchmarks.
89
  <td><b>84.2</b></td>
90
  </tr>
91
  </table>
92
-
93
- > *Note: Generated results for AIME24/25 are available in the [`pcl_reasoner_v1/eval/eval_res`](https://openi.pcl.ac.cn/PCL-Reasoner/V1) directory for developer verification and comparison.*
94
-
95
- #### Impact of Answer Length on Accuracy
96
- We analyzed the relationship between maximum answer length (`max_tokens`) and model accuracy. Due to results listed below, we find that on AIME24 which is relatively simpler, decode length of 64K​ are sufficient to achieve peak accuracy of 85.7%. In contrast, AIME25 which is relatively harder requires ​128K tokens​ to reach optimal performance (84.2%):
97
-
98
- <table>
99
- <tr>
100
- <th>max tokens</th>
101
- <th>16K</th>
102
- <th>32K</th>
103
- <th>64K</th>
104
- <th>128K</th>
105
- </tr>
106
- <tr>
107
- <td>AIME24</td>
108
- <td>42.0</td>
109
- <td>77.9</td>
110
- <td>85.7</td>
111
- <td>85.7</td>
112
- </tr>
113
- <tr>
114
- <td>AIME25</td>
115
- <td>33.4</td>
116
- <td>75.6</td>
117
- <td>83.9</td>
118
- <td>84.2</td>
119
- </tr>
120
- </table>
 
35
  </tr>
36
  <tr>
37
  <td>DeepSeek-R1-0528</td>
38
+ <td><span style="color:grey">91.4</span></td>
39
+ <td><span style="color:grey">87.5</span></td>
40
  </tr>
41
  <tr>
42
  <td>Qwen3-235B-A22B</td>
 
50
  </tr>
51
  <tr>
52
  <td>Gemini-2.5-Pro-0506</td>
53
+ <td><span style="color:grey">90.8</span></td>
54
  <td><span style="color:grey">83</span></td>
55
  </tr>
56
  <!-- 合并行表头 32B -->
 
89
  <td><b>84.2</b></td>
90
  </tr>
91
  </table>