--- language: - en - zh license: apache-2.0 library_name: transformers tags: - qwen3 - reward-model - text-classification base_model: Qwen/Qwen3-8B pipeline_tag: text-classification arxiv: 2601.21912 --- # Model Card for ProRAG-PRM This is the **Process Reward Model (PRM)** associated with the ProRAG project. It is fine-tuned from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to evaluate the quality of intermediate reasoning steps. Based on the methodology described in the paper associated with arXiv ID: **2601.21912**. ## Model Details - **Base Model:** Qwen3-8B - **Type:** Process Reward Model (PRM) / Sequence Classification - **Task:** Step-by-step Reasoning Evaluation - **Paper:** [View on arXiv](https://arxiv.org/abs/2601.21912) ## 💻 Code & Inference This model is designed to assign rewards/scores to reasoning steps. For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository: 👉 **[Click here to view the GitHub Repository](https://github.com/lilinwz/ProRAG/tree/main)** *(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)* ## Citation If you use this model or the associated paper in your research, please cite: ```bibtex @misc{wang2026proragprocesssupervisedreinforcementlearning, title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation}, author={Zhao Wang and Ziliang Zhao and Zhicheng Dou}, year={2026}, eprint={2601.21912}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2601.21912}, } ```