nicholasKluge commited on
Commit
2d3efeb
·
1 Parent(s): 69ade40

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -0
README.md CHANGED
@@ -23,6 +23,24 @@ The model was trained with a dataset composed of `prompt`, `completions`, and an
23
 
24
  > Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ## Usage
27
 
28
  Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt:
 
23
 
24
  > Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
25
 
26
+ ## Details
27
+
28
+ **Number of Epochs:** 5
29
+ **Batch size:** 64
30
+ **Optimizer:** `torch.optim.Adam`
31
+ **Learning Rate:** 1e-4
32
+ **Loss Function:** `torch.nn.MSELoss()`
33
+ **GPU:** 1 NVIDIA A100-SXM4-40GB
34
+ **RMSE in testing:** 0.1190
35
+
36
+ | Epoch/Loss | Training | Validation |
37
+ |------------ |---------- |------------ |
38
+ | 1 | 0.06136 | 0.031815 |
39
+ | 2 | 0.030398 | 0.02459 |
40
+ | 3 | 0.02389 | 0.026569 |
41
+ | 4 | 0.024755 | 0.021097 |
42
+ | 5 | 0.019445 | 0.01416 |
43
+
44
  ## Usage
45
 
46
  Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt: