language:
- en
- zh
license: apache-2.0
library_name: transformers
tags:
- qwen3
- reward-model
- text-classification
base_model: Qwen/Qwen3-8B
pipeline_tag: text-classification
arxiv: 2601.21912
Model Card for ProRAG-PRM
This is the Process Reward Model (PRM) associated with the ProRAG project. It is fine-tuned from Qwen/Qwen3-8B to evaluate the quality of intermediate reasoning steps.
Based on the methodology described in the paper associated with arXiv ID: 2601.21912.
Model Details
- Base Model: Qwen3-8B
- Type: Process Reward Model (PRM) / Sequence Classification
- Task: Step-by-step Reasoning Evaluation
- Paper: View on arXiv
💻 Code & Inference
This model is designed to assign rewards/scores to reasoning steps.
For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository:
👉 Click here to view the GitHub Repository
(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)
Citation
If you use this model or the associated paper in your research, please cite:
@misc{wang2026proragprocesssupervisedreinforcementlearning,
title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation},
author={Zhao Wang and Ziliang Zhao and Zhicheng Dou},
year={2026},
eprint={2601.21912},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.21912},
}