--- license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen2.5-32B/blob/main/LICENSE language: - en - zh pipeline_tag: text-generation datasets: - PCL-Reasoner/V1.5-RL-Math metrics: - accuracy base_model: - Qwen/Qwen2.5-32B tags: - math model-index: - name: PCL-Reasoner/V1.5 results: - task: type: text-generation dataset: name: Aime24 type: Aime24 metrics: - name: Aime24 type: Aime24 value: 90.9 - name: Aime25 type: Aime25 value: 85.6 --- # **PCL-Reasoner-V1.5** ## Model Overview We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model achieves state-of-the-art performance among models post-trained on Qwen2.5-32B, attaining average accuracies of 90.9% on AIME 2024 and 85.6% on AIME 2025. Our work demonstrates offline RL as a stable and efficient paradigm for advancing reasoning in LLMs. All experiments were conducted on Huawei Ascend 910C NPUs. Both training and evaluation processes utilize FP16 precision to maintain numerical accuracy. ![Evaluation Results](images/benchmark.png) ## Code [GitHub Repository](https://github.com/PCL-Reasoner/V1.5) ## RL Dataset [Huggingface Dataset](https://huggingface.co/datasets/PCL-Reasoner/V1.5-RL-Math) ## Evaluation All results are reported using the **pass@1 metric** (averaged over 32 independent sampling attempts per problem), ensuring robust and fair comparison.
Model Scale Model AIME 24 AIME 25
>100B
DeepSeek-R1 79.8 70
DeepSeek-R1-0528 91.4 87.5
Qwen3-235B-A22B 85.7 81.5
OpenAI-o3 91.6 88.9
Gemini-2.5-Pro-0506 90.8 83
32B
Qwen3-32B 81.4 72.9
QwQ-32B 79.5 69.5
DeepSeek-R1-Distill-Qwen-32B 72.6 49.6
Skywork-OR1-32B 82.2 73.3
AM-Thinking-v1 85.3 74.4
OpenReasoning-Nemotron-32B 89.2 84.2
PCL-Reasoner-v1 85.7 84.2
PCL-Reasoner-v1.5 90.9 85.7
## Citation ```bibtex @article{PCL-Reasoner-v1.5, title={PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning}, author={Yao Lu, Dengdong Fan, Jianzheng Nie, Fan Xu, Jie Chen, Bin Zhou, Yonghong Tian}, journal={arXiv preprint arXiv:2601.14716}, year={2026} } ```