V1.5 / README.md

PCL-Reasoner

Update README.md

77bd521 verified 7 days ago

preview code

raw

history blame contribute delete

4.61 kB

metadata

license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen2.5-32B/blob/main/LICENSE
language:
  - en
  - zh
pipeline_tag: text-generation
datasets:
  - PCL-Reasoner/V1.5-RL-Math
metrics:
  - accuracy
base_model:
  - Qwen/Qwen2.5-32B
tags:
  - math
model-index:
  - name: PCL-Reasoner/V1.5
    results:
      - task:
          type: text-generation
        dataset:
          name: Aime24
          type: Aime24
        metrics:
          - name: Aime24
            type: Aime24
            value: 90.9
          - name: Aime25
            type: Aime25
            value: 85.6

PCL-Reasoner-V1.5

Model Overview

We present PCL-Reasoner-V1.5, a 32-billion-parameter large language model (LLM) for mathematical reasoning. The model is built upon Qwen2.5-32B and refined via supervised fine-tuning (SFT) followed by reinforcement learning (RL). A central innovation is our proposed offline RL method, which provides superior training stability and efficiency over standard online RL methods such as GRPO. Our model achieves state-of-the-art performance among models post-trained on Qwen2.5-32B, attaining average accuracies of 90.9% on AIME 2024 and 85.6% on AIME 2025. Our work demonstrates offline RL as a stable and efficient paradigm for advancing reasoning in LLMs. All experiments were conducted on Huawei Ascend 910C NPUs. Both training and evaluation processes utilize FP16 precision to maintain numerical accuracy.

Code

GitHub Repository

RL Dataset

Huggingface Dataset

Evaluation

All results are reported using the pass@1 metric (averaged over 32 independent sampling attempts per problem), ensuring robust and fair comparison.

Model Scale	Model	AIME 24	AIME 25
>100B
	DeepSeek-R1	79.8	70
	DeepSeek-R1-0528	91.4	87.5
	Qwen3-235B-A22B	85.7	81.5
	OpenAI-o3	91.6	88.9
	Gemini-2.5-Pro-0506	90.8	83

32B
	Qwen3-32B	81.4	72.9
	QwQ-32B	79.5	69.5
	DeepSeek-R1-Distill-Qwen-32B	72.6	49.6
	Skywork-OR1-32B	82.2	73.3
	AM-Thinking-v1	85.3	74.4
	OpenReasoning-Nemotron-32B	89.2	84.2
	PCL-Reasoner-v1	85.7	84.2
	PCL-Reasoner-v1.5	90.9	85.7

Citation

@article{PCL-Reasoner-v1.5,
  title={PCL-Reasoner-V1.5: Advancing Math Reasoning with Offline Reinforcement Learning},
  author={Yao Lu, Dengdong Fan, Jianzheng Nie, Fan Xu, Jie Chen, Bin Zhou, Yonghong Tian},
  journal={arXiv preprint arXiv:2601.14716},
  year={2026}
}