nicholasKluge
/

RewardModel

Text Classification

preference model

text-embeddings-inference

Model card Files Files and versions

nicholasKluge commited on Jun 7, 2023

Commit

2d3efeb

·

1 Parent(s): 69ade40

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -23,6 +23,24 @@ The model was trained with a dataset composed of `prompt`, `completions`, and an
 > Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
 ## Usage
 Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt:

 > Note: These prompt + completions are samples of intruction datasets created via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework.
+## Details
+**Number of Epochs:** 5
+**Batch size:** 64
+**Optimizer:** `torch.optim.Adam`
+**Learning Rate:** 1e-4
+**Loss Function:** `torch.nn.MSELoss()`
+**GPU:** 1 NVIDIA A100-SXM4-40GB
+**RMSE in testing:** 0.1190
+| Epoch/Loss 	| Training 	| Validation 	|
+|------------	|----------	|------------	|
+| 1          	| 0.06136  	| 0.031815   	|
+| 2          	| 0.030398 	| 0.02459    	|
+| 3          	| 0.02389  	| 0.026569   	|
+| 4          	| 0.024755 	| 0.021097   	|
+| 5          	| 0.019445 	| 0.01416    	|
 ## Usage
 Here's an example of how to use the `RewardModel` to score the quality of a response to a given prompt: