--- license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen2.5-32B/blob/main/LICENSE language: - en - zh pipeline_tag: text-generation datasets: - PCL-Reasoner/V1.5-RL-Math metrics: - accuracy base_model: - Qwen/Qwen2.5-32B tags: - math model-index: - name: PCL-Reasoner/V1.5 results: - task: type: text-generation dataset: name: Aime24 type: Aime24 metrics: - name: Aime24 type: Aime24 value: 90.9 - name: Aime25 type: Aime25 value: 85.6 --- # **PCL-Reasoner-V1.5** ## Model Overview We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model achieves state-of-the-art performance among models post-trained on Qwen2.5-32B, attaining average accuracies of 90.9% on AIME 2024 and 85.6% on AIME 2025. Our work demonstrates offline RL as a stable and efficient paradigm for advancing reasoning in LLMs. All experiments were conducted on Huawei Ascend 910C NPUs. Both training and evaluation processes utilize FP16 precision to maintain numerical accuracy.  ## Code [GitHub Repository](https://github.com/PCL-Reasoner/V1.5) ## RL Dataset [Huggingface Dataset](https://huggingface.co/datasets/PCL-Reasoner/V1.5-RL-Math) ## Evaluation All results are reported using the **pass@1 metric** (averaged over 32 independent sampling attempts per problem), ensuring robust and fair comparison.
| Model Scale | Model | AIME 24 | AIME 25 |
|---|---|---|---|
| >100B | |||
| DeepSeek-R1 | 79.8 | 70 | |
| DeepSeek-R1-0528 | 91.4 | 87.5 | |
| Qwen3-235B-A22B | 85.7 | 81.5 | |
| OpenAI-o3 | 91.6 | 88.9 | |
| Gemini-2.5-Pro-0506 | 90.8 | 83 | |
| 32B | |||
| Qwen3-32B | 81.4 | 72.9 | |
| QwQ-32B | 79.5 | 69.5 | |
| DeepSeek-R1-Distill-Qwen-32B | 72.6 | 49.6 | |
| Skywork-OR1-32B | 82.2 | 73.3 | |
| AM-Thinking-v1 | 85.3 | 74.4 | |
| OpenReasoning-Nemotron-32B | 89.2 | 84.2 | |
| PCL-Reasoner-v1 | 85.7 | 84.2 | |
| PCL-Reasoner-v1.5 | 90.9 | 85.7 | |