Update README.md
Browse files
README.md
CHANGED
|
@@ -11,11 +11,11 @@ base_model:
|
|
| 11 |
library_name: transformers
|
| 12 |
---
|
| 13 |
|
| 14 |
-
# Model Card for
|
| 15 |
|
| 16 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 17 |
|
| 18 |
-
|
| 19 |
We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
|
| 20 |
|
| 21 |
[Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://github.com/allenai/reward-bench/blob/main/paper-v2.pdf)
|
|
@@ -44,7 +44,7 @@ rm = AutoModelForSequenceClassification("allenai/Llama-3.1-70B-Instruct-RM-RB2",
|
|
| 44 |
- **Training code:** https://github.com/allenai/open-instruct
|
| 45 |
- **Language(s) (NLP):** en
|
| 46 |
- **License:** Llama 3.1 Community License Agreement
|
| 47 |
-
- **Finetuned from model [
|
| 48 |
|
| 49 |
## License
|
| 50 |
|
|
|
|
| 11 |
library_name: transformers
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Model Card for Llama-3.1-70B-Instruct-RM-RB2
|
| 15 |
|
| 16 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 17 |
|
| 18 |
+
Llama-3.1-70B-Instruct-RM-RB2 is one of 7 sets of reward models (RMs) released with Reward Bench 2.
|
| 19 |
We have released a large set of 70 total reward model checkpoints that we used to develop the benchmark and correlate it with downstream PPO / Best-of-N performance.
|
| 20 |
|
| 21 |
[Models](https://huggingface.co/collections/allenai/reward-bench-2-683d2612a4b3e38a3e53bb51) | [Code](https://github.com/allenai/reward-bench) | [Eval. Dataset v2](https://huggingface.co/datasets/allenai/reward-bench-2) | [Results v2](https://huggingface.co/datasets/allenai/reward-bench-2-results) | [Paper](https://github.com/allenai/reward-bench/blob/main/paper-v2.pdf)
|
|
|
|
| 44 |
- **Training code:** https://github.com/allenai/open-instruct
|
| 45 |
- **Language(s) (NLP):** en
|
| 46 |
- **License:** Llama 3.1 Community License Agreement
|
| 47 |
+
- **Finetuned from model:** [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)
|
| 48 |
|
| 49 |
## License
|
| 50 |
|