Update README.md
Browse files
README.md
CHANGED
|
@@ -5,6 +5,7 @@ This preference model is trained from [LLaMA3-8B-it](meta-llama/Meta-Llama-3-8B-
|
|
| 5 |
|
| 6 |
The dataset is RLHFlow/pair_preference_model_dataset. It achieves Chat-98.6, Char-hard 65.8, Safety 89.6, and reasoning 94.9 in reward bench.
|
| 7 |
|
|
|
|
| 8 |
|
| 9 |
## Service the RM
|
| 10 |
|
|
@@ -62,4 +63,26 @@ for chosen_position in [0, 1]:
|
|
| 62 |
avg_prob_chosen = np.mean(probs_chosen)
|
| 63 |
correct = 0.5 if avg_prob_chosen == 0.5 else float(avg_prob_chosen > 0.5)
|
| 64 |
print(correct)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
```
|
|
|
|
| 5 |
|
| 6 |
The dataset is RLHFlow/pair_preference_model_dataset. It achieves Chat-98.6, Char-hard 65.8, Safety 89.6, and reasoning 94.9 in reward bench.
|
| 7 |
|
| 8 |
+
See our paper [RLHF Workflow: From Reward Modeling to Online RLHF](https://arxiv.org/abs/2405.07863) for more details of this model.
|
| 9 |
|
| 10 |
## Service the RM
|
| 11 |
|
|
|
|
| 63 |
avg_prob_chosen = np.mean(probs_chosen)
|
| 64 |
correct = 0.5 if avg_prob_chosen == 0.5 else float(avg_prob_chosen > 0.5)
|
| 65 |
print(correct)
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Citation
|
| 69 |
+
If you use this model in your research, please consider citing our paper
|
| 70 |
+
```
|
| 71 |
+
@misc{rlhflow,
|
| 72 |
+
title={RLHF Workflow: From Reward Modeling to Online RLHF},
|
| 73 |
+
author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
|
| 74 |
+
year={2024},
|
| 75 |
+
eprint={2405.07863},
|
| 76 |
+
archivePrefix={arXiv},
|
| 77 |
+
primaryClass={cs.LG}
|
| 78 |
+
}
|
| 79 |
+
```
|
| 80 |
+
and Google's Slic paper (which initially proposes this pairwise preference model)
|
| 81 |
+
```
|
| 82 |
+
@article{zhao2023slic,
|
| 83 |
+
title={Slic-hf: Sequence likelihood calibration with human feedback},
|
| 84 |
+
author={Zhao, Yao and Joshi, Rishabh and Liu, Tianqi and Khalman, Misha and Saleh, Mohammad and Liu, Peter J},
|
| 85 |
+
journal={arXiv preprint arXiv:2305.10425},
|
| 86 |
+
year={2023}
|
| 87 |
+
}
|
| 88 |
```
|