PCL-Reasoner
/

V1

Safetensors

Model card Files Files and versions

xet

Community

PCL-Reasoner commited on Jul 23, 2025

Commit

6f9971b

verified ·

1 Parent(s): c0c37be

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -32

README.md CHANGED Viewed

@@ -35,8 +35,8 @@ The table below compares mainstream models on the AIME24 and AIME25 benchmarks.
   </tr>
   <tr>
     <td>DeepSeek-R1-0528</td>
-    <td><span style="color:red">91.4</span></td>
-    <td><span style="color:red">87.5</span></td>
   </tr>
   <tr>
     <td>Qwen3-235B-A22B</td>
@@ -50,7 +50,7 @@ The table below compares mainstream models on the AIME24 and AIME25 benchmarks.
   </tr>
   <tr>
     <td>Gemini-2.5-Pro-0506</td>
-    <td><span style="color:red">90.8</span></td>
     <td><span style="color:grey">83</span></td>
   </tr>
   <!-- 合并行表头 32B -->
@@ -89,32 +89,3 @@ The table below compares mainstream models on the AIME24 and AIME25 benchmarks.
     <td><b>84.2</b></td>
   </tr>
 </table>
-> *Note: Generated results for AIME24/25 are available in the [`pcl_reasoner_v1/eval/eval_res`](https://openi.pcl.ac.cn/PCL-Reasoner/V1) directory for developer verification and comparison.*
-#### Impact of Answer Length on Accuracy
-We analyzed the relationship between maximum answer length (`max_tokens`) and model accuracy. Due to results listed below, we find that on AIME24 which is relatively simpler, decode length of 64K are sufficient to achieve peak accuracy of 85.7%. In contrast, AIME25 which is relatively harder requires 128K tokens to reach optimal performance (84.2%):
-<table>
-  <tr>
-    <th>max tokens</th>
-    <th>16K</th>
-    <th>32K</th>
-    <th>64K</th>
-    <th>128K</th>
-  </tr>
-  <tr>
-    <td>AIME24</td>
-    <td>42.0</td>
-    <td>77.9</td>
-    <td>85.7</td>
-    <td>85.7</td>
-  </tr>
-  <tr>
-    <td>AIME25</td>
-    <td>33.4</td>
-    <td>75.6</td>
-    <td>83.9</td>
-    <td>84.2</td>
-  </tr>
-</table>

   </tr>
   <tr>
     <td>DeepSeek-R1-0528</td>
+    <td><span style="color:grey">91.4</span></td>
+    <td><span style="color:grey">87.5</span></td>
   </tr>
   <tr>
     <td>Qwen3-235B-A22B</td>
   </tr>
   <tr>
     <td>Gemini-2.5-Pro-0506</td>
+    <td><span style="color:grey">90.8</span></td>
     <td><span style="color:grey">83</span></td>
   </tr>
   <!-- 合并行表头 32B -->
     <td><b>84.2</b></td>
   </tr>
 </table>