NasimB
/

gpt2_left_out_qed

+---
+license: mit
+tags:
+- generated_from_trainer
+datasets:
+- generator
+model-index:
+- name: gpt2_left_out_qed
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# gpt2_left_out_qed
+This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on the generator dataset.
+It achieves the following results on the evaluation set:
+- Loss: 3.9486
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 1000
+- num_epochs: 10
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch | Step  | Validation Loss |
+|:-------------:|:-----:|:-----:|:---------------:|
+| 5.9695        | 0.27  | 500   | 5.0679          |
+| 4.7417        | 0.53  | 1000  | 4.6811          |
+| 4.4136        | 0.8   | 1500  | 4.4369          |
+| 4.2076        | 1.06  | 2000  | 4.2985          |
+| 4.0279        | 1.33  | 2500  | 4.2048          |
+| 3.9505        | 1.59  | 3000  | 4.1137          |
+| 3.8781        | 1.86  | 3500  | 4.0482          |
+| 3.7338        | 2.12  | 4000  | 4.0046          |
+| 3.6392        | 2.39  | 4500  | 3.9628          |
+| 3.6228        | 2.65  | 5000  | 3.9115          |
+| 3.5944        | 2.92  | 5500  | 3.8738          |
+| 3.4222        | 3.18  | 6000  | 3.8797          |
+| 3.3836        | 3.45  | 6500  | 3.8576          |
+| 3.3995        | 3.71  | 7000  | 3.8251          |
+| 3.3827        | 3.98  | 7500  | 3.7995          |
+| 3.1568        | 4.24  | 8000  | 3.8348          |
+| 3.1778        | 4.51  | 8500  | 3.8171          |
+| 3.1853        | 4.77  | 9000  | 3.7963          |
+| 3.1451        | 5.04  | 9500  | 3.8059          |
+| 2.9278        | 5.31  | 10000 | 3.8298          |
+| 2.9608        | 5.57  | 10500 | 3.8176          |
+| 2.9762        | 5.84  | 11000 | 3.8047          |
+| 2.8716        | 6.1   | 11500 | 3.8433          |
+| 2.7239        | 6.37  | 12000 | 3.8523          |
+| 2.7435        | 6.63  | 12500 | 3.8541          |
+| 2.7524        | 6.9   | 13000 | 3.8446          |
+| 2.6032        | 7.16  | 13500 | 3.8854          |
+| 2.5322        | 7.43  | 14000 | 3.8967          |
+| 2.5369        | 7.69  | 14500 | 3.8983          |
+| 2.5467        | 7.96  | 15000 | 3.8966          |
+| 2.3979        | 8.22  | 15500 | 3.9284          |
+| 2.3767        | 8.49  | 16000 | 3.9334          |
+| 2.3852        | 8.75  | 16500 | 3.9357          |
+| 2.3805        | 9.02  | 17000 | 3.9395          |
+| 2.3012        | 9.28  | 17500 | 3.9463          |
+| 2.3044        | 9.55  | 18000 | 3.9484          |
+| 2.3007        | 9.81  | 18500 | 3.9486          |
+### Framework versions
+- Transformers 4.26.1
+- Pytorch 1.11.0+cu113
+- Datasets 2.13.0
+- Tokenizers 0.13.3