Update README.md
Browse files
README.md
CHANGED
|
@@ -8,12 +8,12 @@ base_model:
|
|
| 8 |
|
| 9 |
# Introduction
|
| 10 |
|
| 11 |
-
We propose GenPRM
|
| 12 |
|
| 13 |
-
- reasoning with explicit CoT and code verfication before providing the process judgment;
|
| 14 |
-
- improving Monte Carlo estimation and hard label with Relative Progress Estimation (RPE)
|
| 15 |
-
- supporting GenPRM test-time scaling in a parallel manner with majority voting;
|
| 16 |
-
- supporting policy model test-time scaling with GenPRM as verifiers or critics
|
| 17 |
|
| 18 |
GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
|
| 19 |
|
|
@@ -26,8 +26,8 @@ GenPRM achieves state-of-the-art performance across multiple benchmarks in two k
|
|
| 26 |
- Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
|
| 27 |
- Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
|
| 28 |
- Awesome Process Reward Models: [Awesome Process Reward Models](https://github.com/RyanLiu112/Awesome-Process-Reward-Models)
|
| 29 |
-
- HF Paper Link
|
| 30 |
-
- HF Collection
|
| 31 |
|
| 32 |
# Model details
|
| 33 |
|
|
|
|
| 8 |
|
| 9 |
# Introduction
|
| 10 |
|
| 11 |
+
We propose **GenPRM**, a strong generative process reward model with the following features:
|
| 12 |
|
| 13 |
+
- reasoning with explicit **CoT reasoning** and **code verfication** before providing the process judgment;
|
| 14 |
+
- improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
|
| 15 |
+
- supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
|
| 16 |
+
- supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
|
| 17 |
|
| 18 |
GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
|
| 19 |
|
|
|
|
| 26 |
- Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
|
| 27 |
- Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
|
| 28 |
- Awesome Process Reward Models: [Awesome Process Reward Models](https://github.com/RyanLiu112/Awesome-Process-Reward-Models)
|
| 29 |
+
- HF Paper Link: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://hf.co/papers/2504.00891)
|
| 30 |
+
- HF Collection: [GenPRM](https://hf.co/collections/GenPRM/genprm-67ee4936234ba5dd16bb9943)
|
| 31 |
|
| 32 |
# Model details
|
| 33 |
|