|
|
--- |
|
|
language: |
|
|
- en |
|
|
- zh |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
tags: |
|
|
- qwen3 |
|
|
- reward-model |
|
|
- text-classification |
|
|
base_model: Qwen/Qwen3-8B |
|
|
pipeline_tag: text-classification |
|
|
arxiv: 2601.21912 |
|
|
--- |
|
|
|
|
|
# Model Card for ProRAG-PRM |
|
|
|
|
|
This is the **Process Reward Model (PRM)** associated with the ProRAG project. It is fine-tuned from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to evaluate the quality of intermediate reasoning steps. |
|
|
|
|
|
Based on the methodology described in the paper associated with arXiv ID: **2601.21912**. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model:** Qwen3-8B |
|
|
- **Type:** Process Reward Model (PRM) / Sequence Classification |
|
|
- **Task:** Step-by-step Reasoning Evaluation |
|
|
- **Paper:** [View on arXiv](https://arxiv.org/abs/2601.21912) |
|
|
|
|
|
## 💻 Code & Inference |
|
|
|
|
|
This model is designed to assign rewards/scores to reasoning steps. |
|
|
|
|
|
For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository: |
|
|
|
|
|
👉 **[Click here to view the GitHub Repository](https://github.com/lilinwz/ProRAG/tree/main)** |
|
|
|
|
|
*(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)* |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model or the associated paper in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{wang2026proragprocesssupervisedreinforcementlearning, |
|
|
title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation}, |
|
|
author={Zhao Wang and Ziliang Zhao and Zhicheng Dou}, |
|
|
year={2026}, |
|
|
eprint={2601.21912}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.AI}, |
|
|
url={https://arxiv.org/abs/2601.21912}, |
|
|
} |
|
|
``` |