Update README.md
Browse files
README.md
CHANGED
|
@@ -35,13 +35,10 @@ model-index:
|
|
| 35 |
# **PCL-Reasoner-V1.5**
|
| 36 |
|
| 37 |
## Model Overview
|
| 38 |
-
We
|
| 39 |
-
|
| 40 |
-
PCL-Reasoner-V1.5 attains **90.9% on AIME 2024** and **85.7% on AIME 2025**, significantly outperforming prior 32B-class models and closing the gap with much larger systems. This advancement stems from refined data curation, improved contamination filtering, and optimized training dynamics tailored for deep reasoning tasks.
|
| 41 |
-
|
| 42 |

|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
## Code
|
| 47 |
|
|
@@ -142,8 +139,8 @@ All results are reported using the **Avg@32 metric** (average accuracy over 32 i
|
|
| 142 |
</tr>
|
| 143 |
<tr>
|
| 144 |
<td>PCL-Reasoner-v1</td>
|
| 145 |
-
<td><p style="
|
| 146 |
-
<td><p style="
|
| 147 |
</tr>
|
| 148 |
<tr>
|
| 149 |
<td>PCL-Reasoner-v1.5</td>
|
|
@@ -158,9 +155,9 @@ All results are reported using the **Avg@32 metric** (average accuracy over 32 i
|
|
| 158 |
|
| 159 |
```bibtex
|
| 160 |
@article{PCL-Reasoner-v1.5,
|
| 161 |
-
title={PCL-Reasoner-
|
| 162 |
-
author={Yao Lu,
|
| 163 |
-
journal={arXiv preprint arXiv:
|
| 164 |
year={2026}
|
| 165 |
}
|
| 166 |
```
|
|
|
|
| 35 |
# **PCL-Reasoner-V1.5**
|
| 36 |
|
| 37 |
## Model Overview
|
| 38 |
+
We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model achieves state-of-the-art performance among models post-trained on Qwen2.5-32B, attaining average accuracies of 90.9% on AIME 2024 and 85.6% on AIME 2025. Our work demonstrates offline RL as a stable and efficient paradigm for advancing reasoning in LLMs. All experiments were conducted on Huawei Ascend 910C NPUs.
|
|
|
|
|
|
|
|
|
|
| 39 |

|
| 40 |
|
| 41 |
+
|
| 42 |
|
| 43 |
## Code
|
| 44 |
|
|
|
|
| 139 |
</tr>
|
| 140 |
<tr>
|
| 141 |
<td>PCL-Reasoner-v1</td>
|
| 142 |
+
<td><p style="color:grey">85.7</p></td>
|
| 143 |
+
<td><p style="color:grey">84.2</p></td>
|
| 144 |
</tr>
|
| 145 |
<tr>
|
| 146 |
<td>PCL-Reasoner-v1.5</td>
|
|
|
|
| 155 |
|
| 156 |
```bibtex
|
| 157 |
@article{PCL-Reasoner-v1.5,
|
| 158 |
+
title={PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning},
|
| 159 |
+
author={Yao Lu, Dengdong Fan, Jianzheng Nie, Fan Xu, Jie Chen, Bin Zhou, Yonghong Tian},
|
| 160 |
+
journal={arXiv preprint arXiv:2601.14716},
|
| 161 |
year={2026}
|
| 162 |
}
|
| 163 |
```
|