NasimB
/

gpt2_left_out_wikipedia

+---
+license: mit
+tags:
+- generated_from_trainer
+datasets:
+- generator
+model-index:
+- name: gpt2_left_out_wikipedia
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# gpt2_left_out_wikipedia
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 3.8366
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 1000
+- num_epochs: 10
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 5.8141        | 0.27  | 500   | 4.8520          |
+| 4.5861        | 0.53  | 1000  | 4.4909          |
+| 4.3045        | 0.8   | 1500  | 4.2742          |
+| 4.0861        | 1.07  | 2000  | 4.1490          |
+| 3.9278        | 1.33  | 2500  | 4.0562          |
+| 3.8591        | 1.6   | 3000  | 3.9800          |
+| 3.7835        | 1.87  | 3500  | 3.9083          |
+| 3.6499        | 2.13  | 4000  | 3.8799          |
+| 3.567         | 2.4   | 4500  | 3.8381          |
+| 3.5361        | 2.67  | 5000  | 3.7975          |
+| 3.5278        | 2.93  | 5500  | 3.7552          |
+| 3.3555        | 3.2   | 6000  | 3.7622          |
+| 3.3265        | 3.47  | 6500  | 3.7426          |
+| 3.3305        | 3.73  | 7000  | 3.7122          |
+| 3.3246        | 4.0   | 7500  | 3.6889          |
+| 3.0968        | 4.27  | 8000  | 3.7216          |
+| 3.1248        | 4.53  | 8500  | 3.7057          |
+| 3.1354        | 4.8   | 9000  | 3.6846          |
+| 3.0701        | 5.07  | 9500  | 3.7066          |
+| 2.8974        | 5.33  | 10000 | 3.7183          |
+| 2.9258        | 5.6   | 10500 | 3.7096          |
+| 2.9387        | 5.87  | 11000 | 3.6943          |
+| 2.7975        | 6.13  | 11500 | 3.7369          |
+| 2.6972        | 6.4   | 12000 | 3.7468          |
+| 2.7193        | 6.67  | 12500 | 3.7422          |
+| 2.7233        | 6.93  | 13000 | 3.7337          |
+| 2.5434        | 7.2   | 13500 | 3.7783          |
+| 2.5072        | 7.47  | 14000 | 3.7864          |
+| 2.5183        | 7.73  | 14500 | 3.7869          |
+| 2.5263        | 8.0   | 15000 | 3.7838          |
+| 2.3533        | 8.27  | 15500 | 3.8174          |
+| 2.3661        | 8.53  | 16000 | 3.8220          |
+| 2.3659        | 8.8   | 16500 | 3.8246          |
+| 2.3462        | 9.07  | 17000 | 3.8313          |
+| 2.286         | 9.33  | 17500 | 3.8359          |
+| 2.2867        | 9.6   | 18000 | 3.8367          |
+| 2.2885        | 9.87  | 18500 | 3.8366          |
+### Framework versions
+- Transformers 4.26.1
+- Pytorch 1.11.0+cu113
+- Datasets 2.13.0
+- Tokenizers 0.13.3