silanm
/

nlp-a5

@@ -1,80 +1,80 @@
----
-library_name: transformers
-license: mit
-base_model: gpt2
-tags:
-- trl
-- dpo
-- generated_from_trainer
-model-index:
-- name: nlp-a5
-  results: []
----
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# nlp-a5
-This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.6409
-- Rewards/chosen: 0.9778
-- Rewards/rejected: -2.1491
-- Rewards/accuracies: 0.8235
-- Rewards/margins: 3.1270
-- Logps/rejected: -410.6469
-- Logps/chosen: -337.3829
-- Logits/rejected: -66.9816
-- Logits/chosen: -67.8481
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 5.38e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 50
-- training_steps: 500
-### Training results
-| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
-|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
-| 0.6454        | 0.1382 | 50   | 0.7701          | 0.4667         | -1.3878          | 0.7591             | 1.8546          | -406.8403      | -339.9385    | -95.7163        | -95.3393      |
-| 0.7265        | 0.2764 | 100  | 0.7531          | 0.2791         | -2.1548          | 0.7777             | 2.4339          | -410.6752      | -340.8765    | -85.4456        | -85.2691      |
-| 0.5317        | 0.4147 | 150  | 0.7164          | 0.0401         | -2.6230          | 0.7743             | 2.6631          | -413.0164      | -342.0717    | -77.7900        | -78.4781      |
-| 0.8947        | 0.5529 | 200  | 0.7223          | -0.0327        | -3.1585          | 0.7961             | 3.1258          | -415.6938      | -342.4356    | -73.7223        | -74.3845      |
-| 0.6882        | 0.6911 | 250  | 0.6677          | 0.6186         | -2.0402          | 0.7904             | 2.6588          | -410.1023      | -339.1790    | -66.4183        | -67.2267      |
-| 0.4596        | 0.8293 | 300  | 0.6199          | 0.5863         | -2.4937          | 0.8116             | 3.0800          | -412.3698      | -339.3405    | -66.5151        | -67.2825      |
-| 0.6719        | 0.9675 | 350  | 0.6214          | 1.1018         | -1.4390          | 0.7842             | 2.5408          | -407.0965      | -336.7633    | -64.9415        | -65.8130      |
-| 0.119         | 1.1057 | 400  | 0.6442          | 0.4069         | -2.8694          | 0.8282             | 3.2763          | -414.2482      | -340.2375    | -64.6611        | -65.4554      |
-| 0.1427        | 1.2440 | 450  | 0.6730          | 1.1133         | -1.9897          | 0.8131             | 3.1030          | -409.8499      | -336.7056    | -65.8348        | -66.7287      |
-| 0.1022        | 1.3822 | 500  | 0.6409          | 0.9778         | -2.1491          | 0.8235             | 3.1270          | -410.6469      | -337.3829    | -66.9816        | -67.8481      |
-### Framework versions
-- Transformers 4.45.0
-- Pytorch 2.4.0+cu124
-- Datasets 3.2.0
-- Tokenizers 0.20.3

+---
+library_name: transformers
+license: mit
+base_model: gpt2
+tags:
+- trl
+- dpo
+- generated_from_trainer
+model-index:
+- name: nlp-a5
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# nlp-a5
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on [`distilabel-intel-orca-dpo-pairs`](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.6409
+- Rewards/chosen: 0.9778
+- Rewards/rejected: -2.1491
+- Rewards/accuracies: 0.8235
+- Rewards/margins: 3.1270
+- Logps/rejected: -410.6469
+- Logps/chosen: -337.3829
+- Logits/rejected: -66.9816
+- Logits/chosen: -67.8481
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5.38e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 32
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 50
+- training_steps: 500
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
+|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
+| 0.6454        | 0.1382 | 50   | 0.7701          | 0.4667         | -1.3878          | 0.7591             | 1.8546          | -406.8403      | -339.9385    | -95.7163        | -95.3393      |
+| 0.7265        | 0.2764 | 100  | 0.7531          | 0.2791         | -2.1548          | 0.7777             | 2.4339          | -410.6752      | -340.8765    | -85.4456        | -85.2691      |
+| 0.5317        | 0.4147 | 150  | 0.7164          | 0.0401         | -2.6230          | 0.7743             | 2.6631          | -413.0164      | -342.0717    | -77.7900        | -78.4781      |
+| 0.8947        | 0.5529 | 200  | 0.7223          | -0.0327        | -3.1585          | 0.7961             | 3.1258          | -415.6938      | -342.4356    | -73.7223        | -74.3845      |
+| 0.6882        | 0.6911 | 250  | 0.6677          | 0.6186         | -2.0402          | 0.7904             | 2.6588          | -410.1023      | -339.1790    | -66.4183        | -67.2267      |
+| 0.4596        | 0.8293 | 300  | 0.6199          | 0.5863         | -2.4937          | 0.8116             | 3.0800          | -412.3698      | -339.3405    | -66.5151        | -67.2825      |
+| 0.6719        | 0.9675 | 350  | 0.6214          | 1.1018         | -1.4390          | 0.7842             | 2.5408          | -407.0965      | -336.7633    | -64.9415        | -65.8130      |
+| 0.119         | 1.1057 | 400  | 0.6442          | 0.4069         | -2.8694          | 0.8282             | 3.2763          | -414.2482      | -340.2375    | -64.6611        | -65.4554      |
+| 0.1427        | 1.2440 | 450  | 0.6730          | 1.1133         | -1.9897          | 0.8131             | 3.1030          | -409.8499      | -336.7056    | -65.8348        | -66.7287      |
+| 0.1022        | 1.3822 | 500  | 0.6409          | 0.9778         | -2.1491          | 0.8235             | 3.1270          | -410.6469      | -337.3829    | -66.9816        | -67.8481      |
+### Framework versions
+- Transformers 4.45.0
+- Pytorch 2.4.0+cu124
+- Datasets 3.2.0
+- Tokenizers 0.20.3