ProRAG_PRM / README.md

bmbgsj

Update README.md

f35a3f2 verified about 13 hours ago

preview code

raw

history blame contribute delete

1.75 kB

metadata

language:
  - en
  - zh
license: apache-2.0
library_name: transformers
tags:
  - qwen3
  - reward-model
  - text-classification
base_model: Qwen/Qwen3-8B
pipeline_tag: text-classification
arxiv: 2601.21912

Model Card for ProRAG-PRM

This is the Process Reward Model (PRM) associated with the ProRAG project. It is fine-tuned from Qwen/Qwen3-8B to evaluate the quality of intermediate reasoning steps.

Based on the methodology described in the paper associated with arXiv ID: 2601.21912.

Model Details

Base Model: Qwen3-8B
Type: Process Reward Model (PRM) / Sequence Classification
Task: Step-by-step Reasoning Evaluation
Paper: View on arXiv

💻 Code & Inference

This model is designed to assign rewards/scores to reasoning steps.

For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository:

👉 Click here to view the GitHub Repository

(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)

Citation

If you use this model or the associated paper in your research, please cite:

@misc{wang2026proragprocesssupervisedreinforcementlearning,
      title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation}, 
      author={Zhao Wang and Ziliang Zhao and Zhicheng Dou},
      year={2026},
      eprint={2601.21912},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2601.21912}, 
}