gpt2_tiny_baby_30M_32768_42

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.7889
  • Accuracy: 0.3364

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 40000
  • training_steps: 100000

Training results

Training Loss Epoch Step Validation Loss Accuracy
8.2793 0.74 2000 8.1124 0.1381
5.66 1.48 4000 6.1510 0.1897
4.6738 2.22 6000 5.4672 0.2165
4.4223 2.96 8000 5.2036 0.2257
4.2575 3.7 10000 5.0124 0.2348
4.1117 4.44 12000 4.8397 0.2489
4.0061 5.18 14000 4.6988 0.2601
3.891 5.92 16000 4.5760 0.2685
3.8028 6.66 18000 4.4775 0.2744
3.7241 7.4 20000 4.3932 0.2806
3.6535 8.14 22000 4.3117 0.2869
3.5996 8.88 24000 4.2564 0.2908
3.528 9.62 26000 4.2035 0.2957
3.4719 10.36 28000 4.1553 0.2992
3.433 11.1 30000 4.1164 0.3018
3.3897 11.84 32000 4.0715 0.3069
3.351 12.58 34000 4.0411 0.3101
3.2992 13.32 36000 4.0163 0.3128
3.2833 14.06 38000 3.9909 0.3149
3.2566 14.8 40000 3.9706 0.3168
3.2263 15.54 42000 3.9436 0.3194
3.198 16.28 44000 3.9330 0.3208
3.1846 17.02 46000 3.9197 0.3216
3.1692 17.76 48000 3.8990 0.3245
3.142 18.5 50000 3.8920 0.3249
3.1276 19.24 52000 3.8815 0.3258
3.1248 19.98 54000 3.8705 0.3267
3.1057 20.72 56000 3.8623 0.3281
3.097 21.46 58000 3.8542 0.3282
3.0836 22.2 60000 3.8472 0.3294
3.0856 22.94 62000 3.8448 0.3293
3.0697 23.68 64000 3.8378 0.3311
3.0515 24.42 66000 3.8402 0.3304
3.0551 25.16 68000 3.8369 0.3311
3.0508 25.9 70000 3.8256 0.3316
3.0337 26.64 72000 3.8208 0.3323
3.0314 27.38 74000 3.8219 0.3324
3.034 28.12 76000 3.8145 0.3334
3.0297 28.86 78000 3.8122 0.3335
3.0164 29.6 80000 3.8123 0.3338
3.0121 30.34 82000 3.8085 0.3342
3.0106 31.08 84000 3.8049 0.3339
3.0019 31.82 86000 3.7997 0.3348
3.0007 32.56 88000 3.8010 0.3350
2.9967 33.3 90000 3.8010 0.3349
2.9911 34.04 92000 3.7952 0.3358
2.987 34.78 94000 3.7940 0.3357
2.9867 35.52 96000 3.7889 0.3364
2.9788 36.26 98000 3.7899 0.3364
2.9818 37.0 100000 3.7893 0.3363

Framework versions

  • Transformers 4.30.2
  • Pytorch 2.0.0+cu117
  • Datasets 4.1.1
  • Tokenizers 0.13.3
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support