nicholasKluge
/

RewardModel

Text Classification

preference model

text-embeddings-inference

Model card Files Files and versions

nicholasKluge commited on Aug 28, 2023

Commit

16e2e2d

·

1 Parent(s): a235e2c

Update README.md

Files changed (1) hide show

README.md +6 -26

README.md CHANGED Viewed

@@ -25,40 +25,20 @@ The `RewardModel` is a [BERT](https://huggingface.co/bert-base-cased) model that
 The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
-These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
 ## Details
 - **Size:** 109,038,209 parameters
 - **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
 - **Language:** English
-- **Number of Epochs:** 5
 - **Batch size:** 42
 - **Optimizer:** `torch.optim.AdamW`
 - **Learning Rate:** 5e-5
 - **GPU:** 1 NVIDIA A100-SXM4-40GB
-- **Emissions:** 0.17 KgCO2 (United States of America)
-- **Total Energy Consumption:** 0.48 kWh
-| Step|Training Loss|Validation Loss|Accuracy|
-|---|---|---|---|
-| 200 |0.080300|0.037106|0.987499|
-| 400 |0.039300|0.036421|0.988433|
-| 600 |0.037200|0.041799|0.986447|
-| 800 |0.011400|0.039411|0.989602|
-| 1000 |0.013800|0.039781|0.989718|
-| 1200 |0.012700|0.034337|0.990887|
-| 1400 |0.005200|0.037403|0.991120|
-| 1600 |0.001800|0.047661|0.990653|
-| 1800 |0.000900|0.051354|0.991237|
-| 2000 |0.001000|0.046224|0.990419|
-| 2200 |0.000200|0.046582|0.991120|
-| 2400 |0.000600|0.046632|0.990536|
-| 2600 |0.000100|0.051437|0.990770|
-| 2800 |0.000500|0.049085|0.990887|
-| 3000 |0.000400|0.049938|0.991004|
-This repository has the notebook used to train this model.
 ## Usage
@@ -120,7 +100,7 @@ This will output the following:
 | Acc  | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons)  |
 |---|---|
-| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel)  | 52.61%*  |
 * *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.

 The model was trained with a dataset composed of `prompt`, `prefered_completions`, and `rejected_completions`.
 ## Details
 - **Size:** 109,038,209 parameters
 - **Dataset:** [Reward-Aira Dataset](https://huggingface.co/datasets/nicholasKluge/reward-aira-dataset)
 - **Language:** English
+- **Number of Training Steps:** 1200
 - **Batch size:** 42
 - **Optimizer:** `torch.optim.AdamW`
 - **Learning Rate:** 5e-5
 - **GPU:** 1 NVIDIA A100-SXM4-40GB
+- **Emissions:** 0.08 KgCO2 (Singapore)
+- **Total Energy Consumption:** 0.16 kWh
+This repository has the [notebook](RewardModel.ipynb) used to train this model.
 ## Usage
 | Acc  | [WebGPT](https://huggingface.co/datasets/openai/webgpt_comparisons)  |
 |---|---|
+| [Aira-RewardModel](https://huggingface.co/nicholasKluge/RewardModel)  | 55.02%*  |
 * *Only considering comparisons of the `webgpt_comparisons` dataset that had a preferred option.