pretrained_gpt2_1.5M
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.5401
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 256
- eval_batch_size: 256
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 625000
- num_epochs: 15
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.4776 | 0.1888 | 1000 | 1.8068 |
| 0.3864 | 0.3776 | 2000 | 1.4641 |
| 0.3546 | 0.5664 | 3000 | 1.3386 |
| 0.3287 | 0.7551 | 4000 | 1.2600 |
| 0.3188 | 0.9439 | 5000 | 1.1931 |
| 0.3032 | 1.1327 | 6000 | 1.1311 |
| 0.2907 | 1.3215 | 7000 | 1.0674 |
| 0.2771 | 1.5103 | 8000 | 1.0217 |
| 0.2654 | 1.6991 | 9000 | 0.9736 |
| 0.2521 | 1.8879 | 10000 | 0.9296 |
| 0.2483 | 2.0766 | 11000 | 0.8892 |
| 0.2337 | 2.2654 | 12000 | 0.8559 |
| 0.2331 | 2.4542 | 13000 | 0.8287 |
| 0.2243 | 2.6430 | 14000 | 0.8064 |
| 0.2142 | 2.8318 | 15000 | 0.7833 |
| 0.2071 | 3.0206 | 16000 | 0.7687 |
| 0.2102 | 3.2094 | 17000 | 0.7541 |
| 0.2017 | 3.3981 | 18000 | 0.7455 |
| 0.195 | 3.5869 | 19000 | 0.7298 |
| 0.1953 | 3.7757 | 20000 | 0.7176 |
| 0.193 | 3.9645 | 21000 | 0.7071 |
| 0.1932 | 4.1533 | 22000 | 0.6946 |
| 0.1854 | 4.3421 | 23000 | 0.6880 |
| 0.1854 | 4.5309 | 24000 | 0.6905 |
| 0.182 | 4.7197 | 25000 | 0.6740 |
| 0.1811 | 4.9084 | 26000 | 0.6641 |
| 0.1791 | 5.0972 | 27000 | 0.6600 |
| 0.1738 | 5.2860 | 28000 | 0.6544 |
| 0.1733 | 5.4748 | 29000 | 0.6459 |
| 0.1697 | 5.6636 | 30000 | 0.6436 |
| 0.1731 | 5.8524 | 31000 | 0.6400 |
| 0.1712 | 6.0412 | 32000 | 0.6424 |
| 0.1705 | 6.2299 | 33000 | 0.6275 |
| 0.1664 | 6.4187 | 34000 | 0.6249 |
| 0.1679 | 6.6075 | 35000 | 0.6197 |
| 0.1668 | 6.7963 | 36000 | 0.6191 |
| 0.1663 | 6.9851 | 37000 | 0.6158 |
| 0.1635 | 7.1739 | 38000 | 0.6103 |
| 0.1621 | 7.3627 | 39000 | 0.6061 |
| 0.1628 | 7.5514 | 40000 | 0.6050 |
| 0.163 | 7.7402 | 41000 | 0.6000 |
| 0.1596 | 7.9290 | 42000 | 0.5975 |
| 0.162 | 8.1178 | 43000 | 0.5946 |
| 0.1607 | 8.3066 | 44000 | 0.5939 |
| 0.1582 | 8.4954 | 45000 | 0.5916 |
| 0.1593 | 8.6842 | 46000 | 0.5864 |
| 0.1593 | 8.8729 | 47000 | 0.5865 |
| 0.156 | 9.0617 | 48000 | 0.5836 |
| 0.1544 | 9.2505 | 49000 | 0.5819 |
| 0.1546 | 9.4393 | 50000 | 0.5787 |
| 0.1533 | 9.6281 | 51000 | 0.5751 |
| 0.1529 | 9.8169 | 52000 | 0.5758 |
| 0.1542 | 10.0057 | 53000 | 0.5728 |
| 0.1528 | 10.1944 | 54000 | 0.5714 |
| 0.1502 | 10.3832 | 55000 | 0.5709 |
| 0.1524 | 10.5720 | 56000 | 0.5691 |
| 0.1505 | 10.7608 | 57000 | 0.5691 |
| 0.1514 | 10.9496 | 58000 | 0.5652 |
| 0.147 | 11.1384 | 59000 | 0.5657 |
| 0.149 | 11.3272 | 60000 | 0.5618 |
| 0.1456 | 11.5160 | 61000 | 0.5601 |
| 0.146 | 11.7047 | 62000 | 0.5588 |
| 0.147 | 11.8935 | 63000 | 0.5599 |
| 0.1493 | 12.0823 | 64000 | 0.5557 |
| 0.1481 | 12.2711 | 65000 | 0.5562 |
| 0.1463 | 12.4599 | 66000 | 0.5536 |
| 0.1479 | 12.6487 | 67000 | 0.5532 |
| 0.1458 | 12.8375 | 68000 | 0.5504 |
| 0.1418 | 13.0262 | 69000 | 0.5511 |
| 0.146 | 13.2150 | 70000 | 0.5496 |
| 0.1452 | 13.4038 | 71000 | 0.5477 |
| 0.144 | 13.5926 | 72000 | 0.5486 |
| 0.1423 | 13.7814 | 73000 | 0.5457 |
| 0.1438 | 13.9702 | 74000 | 0.5451 |
| 0.1461 | 14.1590 | 75000 | 0.5443 |
| 0.1442 | 14.3477 | 76000 | 0.5439 |
| 0.1409 | 14.5365 | 77000 | 0.5410 |
| 0.1443 | 14.7253 | 78000 | 0.5401 |
| 0.1404 | 14.9141 | 79000 | 0.5404 |
Framework versions
- Transformers 4.51.1
- Pytorch 2.6.0+cu124
- Datasets 3.5.0
- Tokenizers 0.21.1
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support