bmbgsj
/

ProRAG_PRM

Text Classification

text-embeddings-inference

Model card Files Files and versions

ProRAG_PRM / README.md

bmbgsj's picture

Update README.md

f35a3f2 verified about 15 hours ago

|

history blame contribute delete

1.75 kB

	---
	language:
	- en
	- zh
	license: apache-2.0
	library_name: transformers
	tags:
	- qwen3
	- reward-model
	- text-classification
	base_model: Qwen/Qwen3-8B
	pipeline_tag: text-classification
	arxiv: 2601.21912
	---

	# Model Card for ProRAG-PRM

	This is the Process Reward Model (PRM) associated with the ProRAG project. It is fine-tuned from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to evaluate the quality of intermediate reasoning steps.

	Based on the methodology described in the paper associated with arXiv ID: 2601.21912.

	## Model Details

	- Base Model: Qwen3-8B
	- Type: Process Reward Model (PRM) / Sequence Classification
	- Task: Step-by-step Reasoning Evaluation
	- Paper: [View on arXiv](https://arxiv.org/abs/2601.21912)

	## 💻 Code & Inference

	This model is designed to assign rewards/scores to reasoning steps.

	For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository:

	👉 [Click here to view the GitHub Repository](https://github.com/lilinwz/ProRAG/tree/main)

	(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)

	## Citation

	If you use this model or the associated paper in your research, please cite:

	```bibtex
	@misc{wang2026proragprocesssupervisedreinforcementlearning,
	title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation},
	author={Zhao Wang and Ziliang Zhao and Zhicheng Dou},
	year={2026},
	eprint={2601.21912},
	archivePrefix={arXiv},
	primaryClass={cs.AI},
	url={https://arxiv.org/abs/2601.21912},
	}
	```