PCL-Reasoner commited on
Commit
d4f05ea
·
verified ·
1 Parent(s): d85a88a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -10
README.md CHANGED
@@ -35,13 +35,10 @@ model-index:
35
  # **PCL-Reasoner-V1.5**
36
 
37
  ## Model Overview
38
- We release **PCL-Reasoner-V1.5**, a 32B reasoning model built upon **PCL-Reasoner-V1** and further enhanced through **offline reinforcement learning** method on the **vllm-ascend** and **MindSpeed-LLM framework** with **Ascend hardware acceleration**. Building on the strong foundation of PCL-Reasoner-V1, PCL-Reasoner-V1.5 achieves even greater improvement in complex mathematical reasoning with long chains of thought (CoT), demonstrating state-of-the-art performance among 32B-scale models.
39
-
40
- PCL-Reasoner-V1.5 attains **90.9% on AIME 2024** and **85.7% on AIME 2025**, significantly outperforming prior 32B-class models and closing the gap with much larger systems. This advancement stems from refined data curation, improved contamination filtering, and optimized training dynamics tailored for deep reasoning tasks.
41
-
42
  ![Evaluation Results](images/benchmark.png)
43
 
44
- We have fully open-sourced the **model weights**, **dataset**, and **training code** to foster transparency, reproducibility, and community innovation. Follow the tutorial below to deploy, evaluate, or extend PCL-Reasoner-V1.5 in your own research!
45
 
46
  ## Code
47
 
@@ -142,8 +139,8 @@ All results are reported using the **Avg@32 metric** (average accuracy over 32 i
142
  </tr>
143
  <tr>
144
  <td>PCL-Reasoner-v1</td>
145
- <td><p style="font-weight:grey;">85.7</p></td>
146
- <td><p style="font-weight:grey;">84.2</p></td>
147
  </tr>
148
  <tr>
149
  <td>PCL-Reasoner-v1.5</td>
@@ -158,9 +155,9 @@ All results are reported using the **Avg@32 metric** (average accuracy over 32 i
158
 
159
  ```bibtex
160
  @article{PCL-Reasoner-v1.5,
161
- title={PCL-Reasoner-v1.5: A Math Problem Solver with Chain of Thought Reasoning},
162
- author={Yao Lu, Deng Dong Fan, Jianzheng Nie, et al.},
163
- journal={arXiv preprint arXiv:2405.14524},
164
  year={2026}
165
  }
166
  ```
 
35
  # **PCL-Reasoner-V1.5**
36
 
37
  ## Model Overview
38
+ We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model achieves state-of-the-art performance among models post-trained on Qwen2.5-32B, attaining average accuracies of 90.9% on AIME 2024 and 85.6% on AIME 2025. Our work demonstrates offline RL as a stable and efficient paradigm for advancing reasoning in LLMs. All experiments were conducted on Huawei Ascend 910C NPUs.
 
 
 
39
  ![Evaluation Results](images/benchmark.png)
40
 
41
+
42
 
43
  ## Code
44
 
 
139
  </tr>
140
  <tr>
141
  <td>PCL-Reasoner-v1</td>
142
+ <td><p style="color:grey">85.7</p></td>
143
+ <td><p style="color:grey">84.2</p></td>
144
  </tr>
145
  <tr>
146
  <td>PCL-Reasoner-v1.5</td>
 
155
 
156
  ```bibtex
157
  @article{PCL-Reasoner-v1.5,
158
+ title={PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning},
159
+ author={Yao Lu, Dengdong Fan, Jianzheng Nie, Fan Xu, Jie Chen, Bin Zhou, Yonghong Tian},
160
+ journal={arXiv preprint arXiv:2601.14716},
161
  year={2026}
162
  }
163
  ```