bmbgsj
/

ProRAG_PRM

@@ -1,57 +1,53 @@
 ---
 library_name: transformers
-model_name: prm
 tags:
-- generated_from_trainer
-- reward-trainer
-- trl
-licence: license
 ---
-# Model Card for prm
-This model is a fine-tuned version of [None](https://huggingface.co/None).
-It has been trained using [TRL](https://github.com/huggingface/trl).
-## Quick start
-```python
-from transformers import pipeline
-text = "The capital of France is Paris."
-rewarder = pipeline(model="None", device="cuda")
-output = rewarder(text)[0]
-print(output["score"])
-```
-## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/lilin22/zhaowang/runs/7233254701.72867-f9dda944-4408)
-This model was trained with Reward.
-### Framework versions
-- TRL: 0.26.2
-- Transformers: 4.57.3
-- Pytorch: 2.8.0
-- Datasets: 4.4.2
-- Tokenizers: 0.22.1
-## Citations
-Cite TRL as:
 ```bibtex
-@misc{vonwerra2022trl,
-	title        = {{TRL: Transformer Reinforcement Learning}},
-	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
-	year         = 2020,
-	journal      = {GitHub repository},
-	publisher    = {GitHub},
-	howpublished = {\url{https://github.com/huggingface/trl}}
 }
 ```

 ---
+language:
+- en
+- zh
+license: apache-2.0
 library_name: transformers
 tags:
+- qwen3
+- reward-model
+- text-classification
+base_model: Qwen/Qwen3-8B
+pipeline_tag: text-classification
+arxiv: 2601.21912
 ---
+# Model Card for ProRAG-PRM
+This is the **Process Reward Model (PRM)** associated with the ProRAG project. It is fine-tuned from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to evaluate the quality of intermediate reasoning steps.
+Based on the methodology described in the paper associated with arXiv ID: **2601.21912**.
+## Model Details
+- **Base Model:** Qwen3-8B
+- **Type:** Process Reward Model (PRM) / Sequence Classification
+- **Task:** Step-by-step Reasoning Evaluation
+- **Paper:** [View on arXiv](https://arxiv.org/abs/2601.21912)
+## 💻 Code & Inference
+This model is designed to assign rewards/scores to reasoning steps.
+For the specific scoring logic, data formatting (e.g., how to mark steps), and inference scripts, please refer to our GitHub repository:
+👉 **[Click here to view the GitHub Repository](https://github.com/lilinwz/ProRAG/tree/main)**
+*(Please ensure you use the correct scoring script provided in the repo, as standard Hugging Face pipelines may not interpret the process rewards correctly without specific formatting.)*
+## Citation
+If you use this model or the associated paper in your research, please cite:
 ```bibtex
+@misc{wang2026proragprocesssupervisedreinforcementlearning,
+      title={ProRAG: Process-Supervised Reinforcement Learning for Retrieval-Augmented Generation},
+      author={Zhao Wang and Ziliang Zhao and Zhicheng Dou},
+      year={2026},
+      eprint={2601.21912},
+      archivePrefix={arXiv},
+      primaryClass={cs.AI},
+      url={https://arxiv.org/abs/2601.21912},
 }
 ```