| | --- |
| | license: apache-2.0 |
| | datasets: |
| | - lvwerra/stack-exchange-paired |
| | language: |
| | - en |
| | library_name: adapter-transformers |
| | pipeline_tag: text-generation |
| | tags: |
| | - reward_model |
| | --- |
| | ## Reward Model GPT2 |
| |
|
| | fine-tuned [GPT2](https://huggingface.co/gpt2) to a reward model. |
| |
|
| | The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more. |
| |
|
| | For training code check the github [example](https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py). |
| |
|
| | info: |
| | * epoch: 1.0 |
| | * train_loss: 0.641692199903866 |
| | * eval_loss: 0.6299035549163818 |
| | * eval_accuracy: 0.729 |
| | |
| | |