Commit ·
16e2e2d
1
Parent(s): a235e2c
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,40 +25,20 @@ The `RewardModel` is a [BERT](https://huggingface.co/bert-base-cased) model that
|
|
| 25 |
|
| 26 |
The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
|
| 27 |
|
| 28 |
-
These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
|
| 29 |
-
|
| 30 |
## Details
|
| 31 |
|
| 32 |
- **Size:** 109,038,209 parameters
|
| 33 |
- **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
|
| 34 |
- **Language:** English
|
| 35 |
-
- **Number of
|
| 36 |
- **Batch size:** 42
|
| 37 |
- **Optimizer:** `torch.optim.AdamW`
|
| 38 |
- **Learning Rate:** 5e-5
|
| 39 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
| 40 |
-
- **Emissions:** 0.
|
| 41 |
-
- **Total Energy Consumption:** 0.
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|---|---|---|---|
|
| 45 |
-
| 200 |0.080300|0.037106|0.987499|
|
| 46 |
-
| 400 |0.039300|0.036421|0.988433|
|
| 47 |
-
| 600 |0.037200|0.041799|0.986447|
|
| 48 |
-
| 800 |0.011400|0.039411|0.989602|
|
| 49 |
-
| 1000 |0.013800|0.039781|0.989718|
|
| 50 |
-
| 1200 |0.012700|0.034337|0.990887|
|
| 51 |
-
| 1400 |0.005200|0.037403|0.991120|
|
| 52 |
-
| 1600 |0.001800|0.047661|0.990653|
|
| 53 |
-
| 1800 |0.000900|0.051354|0.991237|
|
| 54 |
-
| 2000 |0.001000|0.046224|0.990419|
|
| 55 |
-
| 2200 |0.000200|0.046582|0.991120|
|
| 56 |
-
| 2400 |0.000600|0.046632|0.990536|
|
| 57 |
-
| 2600 |0.000100|0.051437|0.990770|
|
| 58 |
-
| 2800 |0.000500|0.049085|0.990887|
|
| 59 |
-
| 3000 |0.000400|0.049938|0.991004|
|
| 60 |
-
|
| 61 |
-
This repository has the notebook used to train this model.
|
| 62 |
|
| 63 |
## Usage
|
| 64 |
|
|
@@ -120,7 +100,7 @@ This will output the following:
|
|
| 120 |
|
| 121 |
| Acc | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) |
|
| 122 |
|---|---|
|
| 123 |
-
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) |
|
| 124 |
|
| 125 |
* *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
|
| 126 |
|
|
|
|
| 25 |
|
| 26 |
The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
|
| 27 |
|
|
|
|
|
|
|
| 28 |
## Details
|
| 29 |
|
| 30 |
- **Size:** 109,038,209 parameters
|
| 31 |
- **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
|
| 32 |
- **Language:** English
|
| 33 |
+
- **Number of Training Steps:** 1200
|
| 34 |
- **Batch size:** 42
|
| 35 |
- **Optimizer:** `torch.optim.AdamW`
|
| 36 |
- **Learning Rate:** 5e-5
|
| 37 |
- **GPU:** 1 NVIDIA A100-SXM4-40GB
|
| 38 |
+
- **Emissions:** 0.08 KgCO2 (Singapore)
|
| 39 |
+
- **Total Energy Consumption:** 0.16 kWh
|
| 40 |
+
|
| 41 |
+
This repository has the [notebook](RewardModel.ipynb) used to train this model.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Usage
|
| 44 |
|
|
|
|
| 100 |
|
| 101 |
| Acc | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) |
|
| 102 |
|---|---|
|
| 103 |
+
| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) | 55.02%* |
|
| 104 |
|
| 105 |
* *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
|
| 106 |
|