mNLP-project
/

gpt2-dpo-mcqa

@@ -1,4 +1,5 @@
 ---
 base_model: mNLP-project/gpt2-finetuned-mcqa
 tags:
 - trl
@@ -16,15 +17,15 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [mNLP-project/gpt2-finetuned-mcqa](https://huggingface.co/mNLP-project/gpt2-finetuned-mcqa) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.6371
-- Rewards/chosen: 1.8147
-- Rewards/rejected: 1.4746
-- Rewards/accuracies: 0.6429
-- Rewards/margins: 0.3401
-- Logps/rejected: -595.2877
-- Logps/chosen: -712.7159
-- Logits/rejected: 3.3478
-- Logits/chosen: 2.3916
 ## Model description
@@ -43,7 +44,7 @@ More information needed
 ### Training hyperparameters
 The following hyperparameters were used during training:
-- learning_rate: 1e-06
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
@@ -58,16 +59,16 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6185        | 0.9993 | 668  | 0.6383          | 1.3102         | 1.0521           | 0.6396             | 0.2580          | -599.5124      | -717.7612    | 3.2114          | 2.3318        |
-| 0.6482        | 2.0    | 1337 | 0.6605          | 1.4570         | 1.2176           | 0.6194             | 0.2394          | -597.8582      | -716.2932    | 3.5209          | 2.5720        |
-| 0.5926        | 2.9993 | 2005 | 0.6371          | 1.8147         | 1.4746           | 0.6429             | 0.3401          | -595.2877      | -712.7159    | 3.3478          | 2.3916        |
-| 0.5284        | 4.0    | 2674 | 0.6425          | 1.8648         | 1.5295           | 0.6276             | 0.3354          | -594.7390      | -712.2144    | 3.4174          | 2.4301        |
-| 0.4941        | 4.9993 | 3342 | 0.6490          | 2.1245         | 1.7548           | 0.6313             | 0.3697          | -592.4860      | -709.6179    | 3.7487          | 2.7230        |
-| 0.4608        | 6.0    | 4011 | 0.6507          | 2.0729         | 1.7055           | 0.6284             | 0.3675          | -592.9789      | -710.1334    | 3.8444          | 2.7879        |
-| 0.4424        | 6.9993 | 4679 | 0.6553          | 2.0245         | 1.6718           | 0.6295             | 0.3527          | -593.3158      | -710.6180    | 3.9476          | 2.8726        |
-| 0.4302        | 8.0    | 5348 | 0.6553          | 2.1030         | 1.7333           | 0.6306             | 0.3698          | -592.7012      | -709.8326    | 4.0016          | 2.9177        |
-| 0.4161        | 8.9993 | 6016 | 0.6564          | 2.1260         | 1.7538           | 0.6328             | 0.3722          | -592.4957      | -709.6025    | 4.0053          | 2.9198        |
-| 0.4051        | 9.9925 | 6680 | 0.6566          | 2.1259         | 1.7535           | 0.6321             | 0.3724          | -592.4987      | -709.6038    | 4.0114          | 2.9244        |
 ### Framework versions

 ---
+license: mit
 base_model: mNLP-project/gpt2-finetuned-mcqa
 tags:
 - trl
 This model is a fine-tuned version of [mNLP-project/gpt2-finetuned-mcqa](https://huggingface.co/mNLP-project/gpt2-finetuned-mcqa) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.6310
+- Rewards/chosen: 1.4580
+- Rewards/rejected: 1.1845
+- Rewards/accuracies: 0.6414
+- Rewards/margins: 0.2735
+- Logps/rejected: -659.0944
+- Logps/chosen: -787.4795
+- Logits/rejected: -14.9328
+- Logits/chosen: -11.6364
 ## Model description
 ### Training hyperparameters
 The following hyperparameters were used during training:
+- learning_rate: 1e-07
 - train_batch_size: 8
 - eval_batch_size: 8
 - seed: 42
 | Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
 |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6407        | 0.9993 | 668  | 0.6460          | 0.7721         | 0.6216           | 0.6295             | 0.1505          | -664.7236      | -794.3383    | -15.1273        | -11.7899      |
+| 0.6498        | 2.0    | 1337 | 0.6374          | 1.2927         | 1.0475           | 0.6325             | 0.2453          | -660.4651      | -789.1318    | -14.9517        | -11.6401      |
+| 0.6468        | 2.9993 | 2005 | 0.6342          | 1.3734         | 1.1102           | 0.6388             | 0.2632          | -659.8373      | -788.3249    | -14.9535        | -11.6481      |
+| 0.6113        | 4.0    | 2674 | 0.6332          | 1.3317         | 1.0769           | 0.6444             | 0.2548          | -660.1705      | -788.7426    | -14.9930        | -11.6897      |
+| 0.5826        | 4.9993 | 3342 | 0.6310          | 1.4580         | 1.1845           | 0.6414             | 0.2735          | -659.0944      | -787.4795    | -14.9328        | -11.6364      |
+| 0.5613        | 6.0    | 4011 | 0.6317          | 1.4979         | 1.2181           | 0.6407             | 0.2798          | -658.7584      | -787.0804    | -14.9234        | -11.6271      |
+| 0.581         | 6.9993 | 4679 | 0.6316          | 1.5084         | 1.2260           | 0.6437             | 0.2825          | -658.6798      | -786.9750    | -14.9319        | -11.6377      |
+| 0.571         | 8.0    | 5348 | 0.6320          | 1.4992         | 1.2184           | 0.6425             | 0.2808          | -658.7557      | -787.0676    | -14.9334        | -11.6373      |
+| 0.5943        | 8.9993 | 6016 | 0.6317          | 1.5126         | 1.2294           | 0.6437             | 0.2832          | -658.6454      | -786.9331    | -14.9226        | -11.6269      |
+| 0.5635        | 9.9925 | 6680 | 0.6317          | 1.5142         | 1.2308           | 0.6433             | 0.2835          | -658.6317      | -786.9168    | -14.9211        | -11.6256      |
 ### Framework versions

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:29b5c39873c02a0a785a4ecffe743df33d396127cf87eb35a11073fb87187cb8
 size 497774208

 version https://git-lfs.github.com/spec/v1
+oid sha256:709c372fa7291e9a81d46d9c732baa6bbf559619bd4cc5421efd2764dbfe2284
 size 497774208