pretrained_gpt2_1.5M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5401

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 256
  • eval_batch_size: 256
  • seed: 42
  • optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 625000
  • num_epochs: 15
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.4776 0.1888 1000 1.8068
0.3864 0.3776 2000 1.4641
0.3546 0.5664 3000 1.3386
0.3287 0.7551 4000 1.2600
0.3188 0.9439 5000 1.1931
0.3032 1.1327 6000 1.1311
0.2907 1.3215 7000 1.0674
0.2771 1.5103 8000 1.0217
0.2654 1.6991 9000 0.9736
0.2521 1.8879 10000 0.9296
0.2483 2.0766 11000 0.8892
0.2337 2.2654 12000 0.8559
0.2331 2.4542 13000 0.8287
0.2243 2.6430 14000 0.8064
0.2142 2.8318 15000 0.7833
0.2071 3.0206 16000 0.7687
0.2102 3.2094 17000 0.7541
0.2017 3.3981 18000 0.7455
0.195 3.5869 19000 0.7298
0.1953 3.7757 20000 0.7176
0.193 3.9645 21000 0.7071
0.1932 4.1533 22000 0.6946
0.1854 4.3421 23000 0.6880
0.1854 4.5309 24000 0.6905
0.182 4.7197 25000 0.6740
0.1811 4.9084 26000 0.6641
0.1791 5.0972 27000 0.6600
0.1738 5.2860 28000 0.6544
0.1733 5.4748 29000 0.6459
0.1697 5.6636 30000 0.6436
0.1731 5.8524 31000 0.6400
0.1712 6.0412 32000 0.6424
0.1705 6.2299 33000 0.6275
0.1664 6.4187 34000 0.6249
0.1679 6.6075 35000 0.6197
0.1668 6.7963 36000 0.6191
0.1663 6.9851 37000 0.6158
0.1635 7.1739 38000 0.6103
0.1621 7.3627 39000 0.6061
0.1628 7.5514 40000 0.6050
0.163 7.7402 41000 0.6000
0.1596 7.9290 42000 0.5975
0.162 8.1178 43000 0.5946
0.1607 8.3066 44000 0.5939
0.1582 8.4954 45000 0.5916
0.1593 8.6842 46000 0.5864
0.1593 8.8729 47000 0.5865
0.156 9.0617 48000 0.5836
0.1544 9.2505 49000 0.5819
0.1546 9.4393 50000 0.5787
0.1533 9.6281 51000 0.5751
0.1529 9.8169 52000 0.5758
0.1542 10.0057 53000 0.5728
0.1528 10.1944 54000 0.5714
0.1502 10.3832 55000 0.5709
0.1524 10.5720 56000 0.5691
0.1505 10.7608 57000 0.5691
0.1514 10.9496 58000 0.5652
0.147 11.1384 59000 0.5657
0.149 11.3272 60000 0.5618
0.1456 11.5160 61000 0.5601
0.146 11.7047 62000 0.5588
0.147 11.8935 63000 0.5599
0.1493 12.0823 64000 0.5557
0.1481 12.2711 65000 0.5562
0.1463 12.4599 66000 0.5536
0.1479 12.6487 67000 0.5532
0.1458 12.8375 68000 0.5504
0.1418 13.0262 69000 0.5511
0.146 13.2150 70000 0.5496
0.1452 13.4038 71000 0.5477
0.144 13.5926 72000 0.5486
0.1423 13.7814 73000 0.5457
0.1438 13.9702 74000 0.5451
0.1461 14.1590 75000 0.5443
0.1442 14.3477 76000 0.5439
0.1409 14.5365 77000 0.5410
0.1443 14.7253 78000 0.5401
0.1404 14.9141 79000 0.5404

Framework versions

  • Transformers 4.51.1
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
85.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support