Safetensors
English
qwen2
RyanLiu112 commited on
Commit
ad84702
·
verified ·
1 Parent(s): 947c96c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -8,12 +8,12 @@ base_model:
8
 
9
  # Introduction
10
 
11
- We propose GenPRM, a strong generative process reward model with the following features:
12
 
13
- - reasoning with explicit CoT and code verfication before providing the process judgment;
14
- - improving Monte Carlo estimation and hard label with Relative Progress Estimation (RPE);
15
- - supporting GenPRM test-time scaling in a parallel manner with majority voting;
16
- - supporting policy model test-time scaling with GenPRM as verifiers or critics.
17
 
18
  GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
19
 
@@ -26,8 +26,8 @@ GenPRM achieves state-of-the-art performance across multiple benchmarks in two k
26
  - Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
27
  - Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
28
  - Awesome Process Reward Models: [Awesome Process Reward Models](https://github.com/RyanLiu112/Awesome-Process-Reward-Models)
29
- - HF Paper Link[GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://hf.co/papers/2504.00891)
30
- - HF Collection[GenPRM](https://hf.co/collections/GenPRM/genprm-67ee4936234ba5dd16bb9943)
31
 
32
  # Model details
33
 
 
8
 
9
  # Introduction
10
 
11
+ We propose **GenPRM**, a strong generative process reward model with the following features:
12
 
13
+ - reasoning with explicit **CoT reasoning** and **code verfication** before providing the process judgment;
14
+ - improving Monte Carlo estimation and hard label with **Relative Progress Estimation (RPE)**;
15
+ - supporting GenPRM **test-time scaling** in a parallel manner with majority voting;
16
+ - supporting policy model test-time scaling with GenPRM as **verifiers** or **critics**.
17
 
18
  GenPRM achieves state-of-the-art performance across multiple benchmarks in two key roles:
19
 
 
26
  - Paper: [https://arxiv.org/abs/2504.00891](https://arxiv.org/abs/2504.00891)
27
  - Code: [https://github.com/RyanLiu112/GenPRM](https://github.com/RyanLiu112/GenPRM)
28
  - Awesome Process Reward Models: [Awesome Process Reward Models](https://github.com/RyanLiu112/Awesome-Process-Reward-Models)
29
+ - HF Paper Link: [GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning](https://hf.co/papers/2504.00891)
30
+ - HF Collection: [GenPRM](https://hf.co/collections/GenPRM/genprm-67ee4936234ba5dd16bb9943)
31
 
32
  # Model details
33