NasimB
/

gpt2_left_out_gutenberg

+---
+license: mit
+tags:
+- generated_from_trainer
+datasets:
+- generator
+model-index:
+- name: gpt2_left_out_gutenberg
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# gpt2_left_out_gutenberg
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 3.9287
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 1000
+- num_epochs: 10
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 5.8917        | 0.26  | 500   | 5.0150          |
+| 4.6559        | 0.53  | 1000  | 4.6338          |
+| 4.3512        | 0.79  | 1500  | 4.4091          |
+| 4.1461        | 1.06  | 2000  | 4.2691          |
+| 3.9654        | 1.32  | 2500  | 4.1719          |
+| 3.8972        | 1.59  | 3000  | 4.0869          |
+| 3.8271        | 1.85  | 3500  | 4.0113          |
+| 3.6889        | 2.12  | 4000  | 3.9762          |
+| 3.586         | 2.38  | 4500  | 3.9376          |
+| 3.5724        | 2.65  | 5000  | 3.8870          |
+| 3.5435        | 2.91  | 5500  | 3.8480          |
+| 3.3888        | 3.17  | 6000  | 3.8520          |
+| 3.3327        | 3.44  | 6500  | 3.8282          |
+| 3.3538        | 3.7   | 7000  | 3.8039          |
+| 3.3427        | 3.97  | 7500  | 3.7743          |
+| 3.1287        | 4.23  | 8000  | 3.8093          |
+| 3.1293        | 4.5   | 8500  | 3.7959          |
+| 3.1508        | 4.76  | 9000  | 3.7735          |
+| 3.1169        | 5.03  | 9500  | 3.7815          |
+| 2.8937        | 5.29  | 10000 | 3.8078          |
+| 2.9281        | 5.56  | 10500 | 3.7999          |
+| 2.9357        | 5.82  | 11000 | 3.7869          |
+| 2.8489        | 6.08  | 11500 | 3.8165          |
+| 2.6858        | 6.35  | 12000 | 3.8367          |
+| 2.7074        | 6.61  | 12500 | 3.8300          |
+| 2.7252        | 6.88  | 13000 | 3.8234          |
+| 2.5862        | 7.14  | 13500 | 3.8661          |
+| 2.4957        | 7.41  | 14000 | 3.8772          |
+| 2.5091        | 7.67  | 14500 | 3.8791          |
+| 2.5155        | 7.94  | 15000 | 3.8773          |
+| 2.3794        | 8.2   | 15500 | 3.9064          |
+| 2.349         | 8.47  | 16000 | 3.9130          |
+| 2.3595        | 8.73  | 16500 | 3.9154          |
+| 2.3579        | 8.99  | 17000 | 3.9160          |
+| 2.2743        | 9.26  | 17500 | 3.9268          |
+| 2.2753        | 9.52  | 18000 | 3.9287          |
+| 2.2734        | 9.79  | 18500 | 3.9287          |
+### Framework versions
+- Transformers 4.26.1
+- Pytorch 1.11.0+cu113
+- Datasets 2.13.0
+- Tokenizers 0.13.3