Add library_name, pipeline_tag, and project page link
#2
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,7 +1,13 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
| 4 |
# Universal-PRM-7B
|
|
|
|
|
|
|
|
|
|
| 5 |
## 1. Overview
|
| 6 |
Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
|
| 7 |
## 2. Experiments
|
|
@@ -75,5 +81,4 @@ with torch.no_grad():
|
|
| 75 |
judge_list_infer.append(reward)
|
| 76 |
|
| 77 |
print(judge_list_infer) # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
|
| 78 |
-
|
| 79 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
---
|
| 6 |
+
|
| 7 |
# Universal-PRM-7B
|
| 8 |
+
|
| 9 |
+
Project page: https://auroraprm.github.io/
|
| 10 |
+
|
| 11 |
## 1. Overview
|
| 12 |
Universal-PRM is trained using Qwen2.5-Math-7B-Instruct as the base. The training process incorporates diverse policy distributions, ensemble prompting, and reverse verification to enhance generalization and robustness. It achieves state-of-the-art performance on ProcessBench and the internally developed UniversalBench.
|
| 13 |
## 2. Experiments
|
|
|
|
| 81 |
judge_list_infer.append(reward)
|
| 82 |
|
| 83 |
print(judge_list_infer) # [0.73828125, 0.7265625, 0.73046875, 0.73828125, 0.734375]
|
| 84 |
+
```
|
|
|