nicholasKluge commited on
Commit
16e2e2d
·
1 Parent(s): a235e2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -26
README.md CHANGED
@@ -25,40 +25,20 @@ The `RewardModel` is a [BERT](https://huggingface.co/bert-base-cased) model that
25
 
26
  The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
27
 
28
- These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
29
-
30
  ## Details
31
 
32
  - **Size:** 109,038,209 parameters
33
  - **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
34
  - **Language:** English
35
- - **Number of Epochs:** 5
36
  - **Batch size:** 42
37
  - **Optimizer:** `torch.optim.AdamW`
38
  - **Learning Rate:** 5e-5
39
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
40
- - **Emissions:** 0.17 KgCO2 (United States of America)
41
- - **Total Energy Consumption:** 0.48 kWh
42
-
43
- | Step|Training Loss|Validation Loss|Accuracy|
44
- |---|---|---|---|
45
- | 200 |0.080300|0.037106|0.987499|
46
- | 400 |0.039300|0.036421|0.988433|
47
- | 600 |0.037200|0.041799|0.986447|
48
- | 800 |0.011400|0.039411|0.989602|
49
- | 1000 |0.013800|0.039781|0.989718|
50
- | 1200 |0.012700|0.034337|0.990887|
51
- | 1400 |0.005200|0.037403|0.991120|
52
- | 1600 |0.001800|0.047661|0.990653|
53
- | 1800 |0.000900|0.051354|0.991237|
54
- | 2000 |0.001000|0.046224|0.990419|
55
- | 2200 |0.000200|0.046582|0.991120|
56
- | 2400 |0.000600|0.046632|0.990536|
57
- | 2600 |0.000100|0.051437|0.990770|
58
- | 2800 |0.000500|0.049085|0.990887|
59
- | 3000 |0.000400|0.049938|0.991004|
60
-
61
- This repository has the notebook used to train this model.
62
 
63
  ## Usage
64
 
@@ -120,7 +100,7 @@ This will output the following:
120
 
121
  | Acc | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) |
122
  |---|---|
123
- | [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) | 52.61%* |
124
 
125
  * *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
126
 
 
25
 
26
  The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
27
 
 
 
28
  ## Details
29
 
30
  - **Size:** 109,038,209 parameters
31
  - **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
32
  - **Language:** English
33
+ - **Number of Training Steps:** 1200
34
  - **Batch size:** 42
35
  - **Optimizer:** `torch.optim.AdamW`
36
  - **Learning Rate:** 5e-5
37
  - **GPU:** 1 NVIDIA A100-SXM4-40GB
38
+ - **Emissions:** 0.08 KgCO2 (Singapore)
39
+ - **Total Energy Consumption:** 0.16 kWh
40
+
41
+ This repository has the [notebook](RewardModel.ipynb) used to train this model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Usage
44
 
 
100
 
101
  | Acc | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons) |
102
  |---|---|
103
+ | [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel) | 55.02%* |
104
 
105
  * *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.
106