ProRAG_PRM / README.md
bmbgsj's picture
Update README.md
f35a3f2 verified
---
language:
- en
- zh
license: apache-2.0
library_name: transformers
tags:
- qwen3
- reward-model
- text-classification
base_model: Qwen/Qwen3-8B
pipeline_tag: text-classification
arxiv: 2601.21912
---
# Model Card for ProRAG-PRM
This is the **Process Reward Model (PRM)** associated with the ProRAG project. It is fine-tuned from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to evaluate the quality of intermediate reasoning steps.
Based on the methodology described in the paper associated with arXiv ID: **2601.21912**.
## Model Details
- **Base Model:** Qwen3-8B
- **Type:** Process Reward Model (PRM) / Sequence Classification
- **Task:** Step-by-step Reasoning Evaluation
- **Paper:** [View on arXiv](https://arxiv.org/abs/2601.21912)
## 💻 Code & Inference
This model is designed to assign rewards/scores to reasoning steps.
For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository:
👉 **[Click here to view the GitHub Repository](https://github.com/lilinwz/ProRAG/tree/main)**
*(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)*
## Citation
If you use this model or the associated paper in your research, please cite:
```bibtex
@misc{wang2026proragprocesssupervisedreinforcementlearning,
title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation},
author={Zhao Wang and Ziliang Zhao and Zhicheng Dou},
year={2026},
eprint={2601.21912},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.21912},
}
```