--- license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen2.5-32B/blob/main/LICENSE language: - en - zh pipeline_tag: text-generation datasets: - PCL-Reasoner/V1.5-RL-Math metrics: - accuracy base_model: - Qwen/Qwen2.5-32B tags: - math model-index: - name: PCL-Reasoner/V1.5 results: - task: type: text-generation dataset: name: Aime24 type: Aime24 metrics: - name: Aime24 type: Aime24 value: 90.9 - name: Aime25 type: Aime25 value: 85.6 --- # **PCL-Reasoner-V1.5** ## Model Overview We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model achieves state-of-the-art performance among models post-trained on Qwen2.5-32B, attaining average accuracies of 90.9% on AIME 2024 and 85.6% on AIME 2025. Our work demonstrates offline RL as a stable and efficient paradigm for advancing reasoning in LLMs. All experiments were conducted on Huawei Ascend 910C NPUs. Both training and evaluation processes utilize FP16 precision to maintain numerical accuracy. ![Evaluation Results](images/benchmark.png) ## Code [GitHub Repository](https://github.com/PCL-Reasoner/V1.5) ## RL Dataset [Huggingface Dataset](https://huggingface.co/datasets/PCL-Reasoner/V1.5-RL-Math) ## Evaluation All results are reported using the **pass@1 metric** (averaged over 32 independent sampling attempts per problem), ensuring robust and fair comparison.

Model Scale	Model	AIME 24	AIME 25
>100B
	DeepSeek-R1	79.8	70
	DeepSeek-R1-0528	91.4	87.5
	Qwen3-235B-A22B	85.7	81.5
	OpenAI-o3	91.6	88.9
	Gemini-2.5-Pro-0506	90.8	83

32B
	Qwen3-32B	81.4	72.9
	QwQ-32B	79.5	69.5
	DeepSeek-R1-Distill-Qwen-32B	72.6	49.6
	Skywork-OR1-32B	82.2	73.3
	AM-Thinking-v1	85.3	74.4
	OpenReasoning-Nemotron-32B	89.2	84.2
	PCL-Reasoner-v1	85.7	84.2
	PCL-Reasoner-v1.5	90.9	85.7

## Citation ```bibtex @article{PCL-Reasoner-v1.5, title={PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning}, author={Yao Lu, Dengdong Fan, Jianzheng Nie, Fan Xu, Jie Chen, Bin Zhou, Yonghong Tian}, journal={arXiv preprint arXiv:2601.14716}, year={2026} } ```