NasimB
/

gpt2-dp-3

+---
+license: mit
+tags:
+- generated_from_trainer
+datasets:
+- generator
+model-index:
+- name: gpt2-dp-3
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# gpt2-dp-3
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 4.4076
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 1000
+- num_epochs: 10
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 6.7156        | 0.27  | 500   | 5.6535          |
+| 5.3578        | 0.53  | 1000  | 5.2045          |
+| 5.0077        | 0.8   | 1500  | 4.9659          |
+| 4.7593        | 1.07  | 2000  | 4.8126          |
+| 4.5687        | 1.34  | 2500  | 4.7029          |
+| 4.4766        | 1.6   | 3000  | 4.5953          |
+| 4.3917        | 1.87  | 3500  | 4.5056          |
+| 4.2228        | 2.14  | 4000  | 4.4626          |
+| 4.1279        | 2.4   | 4500  | 4.4147          |
+| 4.1019        | 2.67  | 5000  | 4.3627          |
+| 4.0683        | 2.94  | 5500  | 4.3206          |
+| 3.869         | 3.21  | 6000  | 4.3295          |
+| 3.8494        | 3.47  | 6500  | 4.3034          |
+| 3.8533        | 3.74  | 7000  | 4.2734          |
+| 3.8342        | 4.01  | 7500  | 4.2661          |
+| 3.5799        | 4.27  | 8000  | 4.2817          |
+| 3.6163        | 4.54  | 8500  | 4.2654          |
+| 3.6245        | 4.81  | 9000  | 4.2402          |
+| 3.5328        | 5.07  | 9500  | 4.2692          |
+| 3.3455        | 5.34  | 10000 | 4.2804          |
+| 3.3898        | 5.61  | 10500 | 4.2662          |
+| 3.3933        | 5.88  | 11000 | 4.2519          |
+| 3.2239        | 6.14  | 11500 | 4.3025          |
+| 3.1152        | 6.41  | 12000 | 4.3098          |
+| 3.14          | 6.68  | 12500 | 4.3060          |
+| 3.1585        | 6.94  | 13000 | 4.2908          |
+| 2.9392        | 7.21  | 13500 | 4.3478          |
+| 2.9031        | 7.48  | 14000 | 4.3549          |
+| 2.9201        | 7.75  | 14500 | 4.3523          |
+| 2.9044        | 8.01  | 15000 | 4.3650          |
+| 2.7244        | 8.28  | 15500 | 4.3877          |
+| 2.7371        | 8.55  | 16000 | 4.3929          |
+| 2.745         | 8.81  | 16500 | 4.3943          |
+| 2.7233        | 9.08  | 17000 | 4.4028          |
+| 2.6481        | 9.35  | 17500 | 4.4060          |
+| 2.6578        | 9.62  | 18000 | 4.4077          |
+| 2.6554        | 9.88  | 18500 | 4.4076          |
+### Framework versions
+- Transformers 4.26.1
+- Pytorch 1.11.0+cu113
+- Datasets 2.13.0
+- Tokenizers 0.13.3