Commit ·
2d3efeb
1
Parent(s): 69ade40
Update README.md
Browse files
README.md
CHANGED
|
@@ -23,6 +23,24 @@ The model was trained with a dataset composed of `prompt`, `completions`, and an
|
|
| 23 |
|
| 24 |
> Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
|
| 25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
## Usage
|
| 27 |
|
| 28 |
Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt:
|
|
|
|
| 23 |
|
| 24 |
> Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
|
| 25 |
|
| 26 |
+
## Details
|
| 27 |
+
|
| 28 |
+
**Number of Epochs:** 5
|
| 29 |
+
**Batch size:** 64
|
| 30 |
+
**Optimizer:** `torch.optim.Adam`
|
| 31 |
+
**Learning Rate:** 1e-4
|
| 32 |
+
**Loss Function:** `torch.nn.MSELoss()`
|
| 33 |
+
**GPU:** 1 NVIDIA A100-SXM4-40GB
|
| 34 |
+
**RMSE in testing:** 0.1190
|
| 35 |
+
|
| 36 |
+
| Epoch/Loss | Training | Validation |
|
| 37 |
+
|------------ |---------- |------------ |
|
| 38 |
+
| 1 | 0.06136 | 0.031815 |
|
| 39 |
+
| 2 | 0.030398 | 0.02459 |
|
| 40 |
+
| 3 | 0.02389 | 0.026569 |
|
| 41 |
+
| 4 | 0.024755 | 0.021097 |
|
| 42 |
+
| 5 | 0.019445 | 0.01416 |
|
| 43 |
+
|
| 44 |
## Usage
|
| 45 |
|
| 46 |
Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt:
|