gpt2_tiny_baby_50M_32768_42

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7715
  • Accuracy: 0.3332

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss Accuracy
8.283 0.43 2000 8.1606 0.1193
5.6979 0.87 4000 6.2471 0.1794
4.7397 1.3 6000 5.5927 0.2038
4.4648 1.73 8000 5.3240 0.2134
4.304 2.17 10000 5.1259 0.2224
4.1746 2.6 12000 4.9563 0.2346
4.0534 3.03 14000 4.8167 0.2452
3.9553 3.47 16000 4.6929 0.2529
3.8577 3.9 18000 4.5889 0.2604
3.7753 4.33 20000 4.5094 0.2662
3.6998 4.77 22000 4.4154 0.2721
3.6369 5.2 24000 4.3482 0.2773
3.5876 5.64 26000 4.2805 0.2822
3.5287 6.07 28000 4.2261 0.2864
3.4839 6.5 30000 4.1755 0.2907
3.4458 6.94 32000 4.1339 0.2953
3.4049 7.37 34000 4.1012 0.2991
3.369 7.8 36000 4.0649 0.3019
3.3284 8.24 38000 4.0342 0.3044
3.316 8.67 40000 4.0054 0.3075
3.2823 9.1 42000 3.9841 0.3102
3.258 9.54 44000 3.9635 0.3127
3.2452 9.97 46000 3.9409 0.3150
3.2171 10.4 48000 3.9224 0.3161
3.2129 10.84 50000 3.9127 0.3176
3.1892 11.27 52000 3.8949 0.3196
3.1881 11.7 54000 3.8819 0.3205
3.1666 12.14 56000 3.8752 0.3220
3.1579 12.57 58000 3.8623 0.3230
3.167 13.0 60000 3.8587 0.3239
3.1434 13.44 62000 3.8529 0.3233
3.1465 13.87 64000 3.8404 0.3256
3.1154 14.3 66000 3.8405 0.3253
3.1263 14.74 68000 3.8275 0.3269
3.1178 15.17 70000 3.8242 0.3271
3.1115 15.6 72000 3.8151 0.3280
3.1085 16.04 74000 3.8117 0.3287
3.0968 16.47 76000 3.8081 0.3288
3.1017 16.91 78000 3.8005 0.3298
3.0839 17.34 80000 3.8010 0.3296
3.0855 17.77 82000 3.7976 0.3305
3.0794 18.21 84000 3.7902 0.3313
3.0728 18.64 86000 3.7906 0.3309
3.0754 19.07 88000 3.7862 0.3313
3.0675 19.51 90000 3.7835 0.3321
3.0707 19.94 92000 3.7769 0.3330
3.0606 20.37 94000 3.7770 0.3324
3.0614 20.81 96000 3.7753 0.3327
3.0507 21.24 98000 3.7728 0.3330
3.0538 21.67 100000 3.7715 0.3332

Framework versions

  • Transformers 4.30.2
  • Pytorch 2.0.0+cu117
  • Datasets 4.1.1
  • Tokenizers 0.13.3
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including xiulinyang/gpt2_tiny_baby_50M_32768_42