GPT2-705M-RUN1 / README.md
ninagroot's picture
ninagroot/GPT2-705Mtest
ad43c57 verified
metadata
tags:
  - generated_from_trainer
model-index:
  - name: GPT2-705M
    results: []

GPT2-705M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.4628

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.7135 0.57 1 9.7272
8.0222 1.71 3 9.3213
7.6063 2.86 5 8.5841
7.5596 4.0 7 7.9271
7.4194 4.57 8 8.0942
7.1644 5.71 10 7.5409
6.8531 6.86 12 7.3028
6.3614 8.0 14 9.3796
8.5129 8.57 15 7.6361
6.1325 9.71 17 6.7577
5.8526 10.86 19 6.5249
5.5941 12.0 21 6.2490
5.4307 12.57 22 6.2442
5.1381 13.71 24 5.9595
4.8705 14.86 26 5.8944
4.7083 16.0 28 5.7005
4.5355 16.57 29 5.7459
4.4187 17.71 31 5.5387
4.3123 18.86 33 5.4863
4.0269 20.0 35 5.3277
3.942 20.57 36 5.3274
3.784 21.71 38 5.3998
3.4991 22.86 40 5.4628

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0