Update README.md
Browse files
README.md
CHANGED
|
@@ -15,10 +15,11 @@ fine-tuned [GPT2](https://huggingface.co/gpt2) to a reward model.
|
|
| 15 |
|
| 16 |
The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more.
|
| 17 |
|
| 18 |
-
For
|
| 19 |
|
| 20 |
info:
|
| 21 |
* epoch: 1.0
|
| 22 |
* train_loss: 0.641692199903866
|
| 23 |
* eval_loss: 0.6299035549163818
|
| 24 |
* eval_accuracy: 0.729
|
|
|
|
|
|
| 15 |
|
| 16 |
The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more.
|
| 17 |
|
| 18 |
+
For training code check the github [example](https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py).
|
| 19 |
|
| 20 |
info:
|
| 21 |
* epoch: 1.0
|
| 22 |
* train_loss: 0.641692199903866
|
| 23 |
* eval_loss: 0.6299035549163818
|
| 24 |
* eval_accuracy: 0.729
|
| 25 |
+
|