Update README.md
Browse files
README.md
CHANGED
|
@@ -130,7 +130,9 @@ We use codes in [Implicit PRM](https://github.com/PRIME-RL/ImplicitPRM/tree/main
|
|
| 130 |
|
| 131 |
### Evaluation Base Model
|
| 132 |
|
| 133 |
-
|
|
|
|
|
|
|
| 134 |
|
| 135 |
### Best-of-N Sampling
|
| 136 |
|
|
|
|
| 130 |
|
| 131 |
### Evaluation Base Model
|
| 132 |
|
| 133 |
+
For **Best-of N Sampling**, we adopt **Eurus-2-7B-SFT**, **Qwen2.5-7B-Instruct** and **Llama-3.1-70B-Instruct** as generation models to evaluate the performance of our implicit PRM. For all models, we set the sampling temperature as 0.5, *p* of the top-*p* sampling as 1.
|
| 134 |
+
|
| 135 |
+
For **ProcessBench**, we adopt **Math-Shepherd-PRM-7B**, **RLHFlow-PRM-Mistral-8B**, **RLHFlow-PRM-Deepseek-8B**, **Skywork-PRM-7B**, **EurusPRM-Stage 1**, and **EurusPRM-Stage 2**.
|
| 136 |
|
| 137 |
### Best-of-N Sampling
|
| 138 |
|