Configuration Parsing
Warning:
In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string
flan-t5la-small
This model is a fine-tuned version of hrezaei/flan-t5la-small on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:
- Perplexity: 7.7313
- Loss: 2.0453
- Accuracy: 0.0032
- Lookahead Perplexity: 49.1518
- Lookahead Loss: 3.8949
- Base Perplexity: 1.2146
- Base Loss: 0.1944
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- training_steps: 524288
Training results
| Training Loss | Epoch | Step | Perplexity | Validation Loss | Accuracy | Lookahead Perplexity | Lookahead Loss | Base Perplexity | Base Loss |
|---|---|---|---|---|---|---|---|---|---|
| 4.8052 | 0.0095 | 5000 | 250.1917 | 5.5222 | 0.0032 | 51687.8881 | 10.8530 | 1.2146 | 0.1944 |
| 4.1668 | 0.0191 | 10000 | 88.2828 | 4.4805 | 0.0032 | 6430.9941 | 8.7689 | 1.2146 | 0.1944 |
| 3.7641 | 0.0286 | 15000 | 48.8034 | 3.8878 | 0.0032 | 1966.0577 | 7.5838 | 1.2146 | 0.1944 |
| 3.5174 | 0.0381 | 20000 | 33.9964 | 3.5263 | 0.0032 | 953.9463 | 6.8606 | 1.2146 | 0.1944 |
| 3.3815 | 0.0477 | 25000 | 26.7476 | 3.2864 | 0.0032 | 590.2392 | 6.3805 | 1.2146 | 0.1944 |
| 3.2612 | 0.0572 | 30000 | 22.4741 | 3.1124 | 0.0032 | 416.4622 | 6.0318 | 1.2146 | 0.1944 |
| 3.1695 | 0.0668 | 35000 | 19.6661 | 2.9789 | 0.0032 | 318.7243 | 5.7643 | 1.2146 | 0.1944 |
| 3.1099 | 0.0763 | 40000 | 17.6862 | 2.8728 | 0.0032 | 257.6626 | 5.5517 | 1.2146 | 0.1944 |
| 3.0454 | 0.0858 | 45000 | 16.2331 | 2.7871 | 0.0032 | 216.9703 | 5.3798 | 1.2146 | 0.1944 |
| 2.9932 | 0.0954 | 50000 | 15.1058 | 2.7151 | 0.0032 | 187.8344 | 5.2356 | 1.2146 | 0.1944 |
| 2.9685 | 0.1049 | 55000 | 14.2347 | 2.6557 | 0.0032 | 166.7505 | 5.1165 | 1.2146 | 0.1944 |
| 2.9146 | 0.1144 | 60000 | 13.5266 | 2.6047 | 0.0032 | 150.5264 | 5.0141 | 1.2146 | 0.1944 |
| 2.8939 | 1.0048 | 65000 | 12.9238 | 2.5591 | 0.0032 | 137.3885 | 4.9228 | 1.2146 | 0.1944 |
| 2.8667 | 1.0143 | 70000 | 12.4200 | 2.5193 | 0.0032 | 126.8708 | 4.8432 | 1.2146 | 0.1944 |
| 2.8495 | 1.0238 | 75000 | 11.9951 | 2.4845 | 0.0032 | 118.3322 | 4.7735 | 1.2146 | 0.1944 |
| 2.8226 | 1.0334 | 80000 | 11.6344 | 2.4540 | 0.0032 | 111.3047 | 4.7123 | 1.2146 | 0.1944 |
| 2.8131 | 1.0429 | 85000 | 11.3094 | 2.4256 | 0.0032 | 105.1683 | 4.6556 | 1.2146 | 0.1944 |
| 2.7982 | 1.0525 | 90000 | 11.0316 | 2.4008 | 0.0032 | 100.0606 | 4.6058 | 1.2146 | 0.1944 |
| 2.7809 | 1.0620 | 95000 | 10.7938 | 2.3790 | 0.0032 | 95.7859 | 4.5621 | 1.2146 | 0.1944 |
| 2.7659 | 1.0715 | 100000 | 10.5758 | 2.3586 | 0.0032 | 91.9492 | 4.5212 | 1.2146 | 0.1944 |
| 2.7617 | 1.0811 | 105000 | 10.3792 | 2.3398 | 0.0032 | 88.5628 | 4.4837 | 1.2146 | 0.1944 |
| 2.7458 | 1.0906 | 110000 | 10.1936 | 2.3218 | 0.0032 | 85.4272 | 4.4477 | 1.2146 | 0.1944 |
| 2.7383 | 1.1001 | 115000 | 10.0342 | 2.3060 | 0.0032 | 82.7731 | 4.4161 | 1.2146 | 0.1944 |
| 2.7269 | 1.1097 | 120000 | 9.8958 | 2.2921 | 0.0032 | 80.4986 | 4.3882 | 1.2146 | 0.1944 |
| 2.7131 | 1.1192 | 125000 | 9.7566 | 2.2779 | 0.0032 | 78.2504 | 4.3599 | 1.2146 | 0.1944 |
| 2.6989 | 2.0095 | 130000 | 9.6413 | 2.2661 | 0.0032 | 76.4115 | 4.3361 | 1.2146 | 0.1944 |
| 2.7099 | 2.0191 | 135000 | 9.5230 | 2.2537 | 0.0032 | 74.5514 | 4.3115 | 1.2146 | 0.1944 |
| 2.7143 | 2.0286 | 140000 | 9.4245 | 2.2433 | 0.0032 | 73.0154 | 4.2907 | 1.2146 | 0.1944 |
| 2.682 | 2.0381 | 145000 | 9.3258 | 2.2328 | 0.0032 | 71.4968 | 4.2697 | 1.2146 | 0.1944 |
| 2.7051 | 2.0477 | 150000 | 9.2432 | 2.2239 | 0.0032 | 70.2333 | 4.2518 | 1.2146 | 0.1944 |
| 2.6935 | 2.0572 | 155000 | 9.1599 | 2.2148 | 0.0032 | 68.9760 | 4.2338 | 1.2146 | 0.1944 |
| 2.6733 | 2.0668 | 160000 | 9.0837 | 2.2065 | 0.0032 | 67.8339 | 4.2171 | 1.2146 | 0.1944 |
| 2.6706 | 2.0763 | 165000 | 9.0112 | 2.1985 | 0.0032 | 66.7535 | 4.2010 | 1.2146 | 0.1944 |
| 2.6538 | 2.0858 | 170000 | 8.9443 | 2.1910 | 0.0032 | 65.7671 | 4.1861 | 1.2146 | 0.1944 |
| 2.6397 | 2.0954 | 175000 | 8.8802 | 2.1838 | 0.0032 | 64.8308 | 4.1718 | 1.2146 | 0.1944 |
| 2.6608 | 2.1049 | 180000 | 8.8237 | 2.1774 | 0.0032 | 64.0079 | 4.1590 | 1.2146 | 0.1944 |
| 2.6339 | 2.1144 | 185000 | 8.7756 | 2.1720 | 0.0032 | 63.3068 | 4.1480 | 1.2146 | 0.1944 |
| 2.6327 | 3.0048 | 190000 | 8.7198 | 2.1656 | 0.0032 | 62.5058 | 4.1353 | 1.2146 | 0.1944 |
| 2.6281 | 3.0143 | 195000 | 8.6693 | 2.1598 | 0.0032 | 61.7858 | 4.1237 | 1.2146 | 0.1944 |
| 2.6281 | 3.0238 | 200000 | 8.6218 | 2.1543 | 0.0032 | 61.1157 | 4.1128 | 1.2146 | 0.1944 |
| 2.6206 | 3.0334 | 205000 | 8.5813 | 2.1496 | 0.0032 | 60.5393 | 4.1033 | 1.2146 | 0.1944 |
| 2.6253 | 3.0429 | 210000 | 8.5365 | 2.1443 | 0.0032 | 59.9116 | 4.0929 | 1.2146 | 0.1944 |
| 2.6222 | 3.0525 | 215000 | 8.4976 | 2.1398 | 0.0032 | 59.3662 | 4.0837 | 1.2146 | 0.1944 |
| 2.619 | 3.0620 | 220000 | 8.4646 | 2.1359 | 0.0032 | 58.9064 | 4.0759 | 1.2146 | 0.1944 |
| 2.6154 | 3.0715 | 225000 | 8.4291 | 2.1317 | 0.0032 | 58.4114 | 4.0675 | 1.2146 | 0.1944 |
| 2.6194 | 3.0811 | 230000 | 8.3955 | 2.1277 | 0.0032 | 57.9497 | 4.0596 | 1.2146 | 0.1944 |
| 2.6071 | 3.0906 | 235000 | 8.3591 | 2.1233 | 0.0032 | 57.4509 | 4.0509 | 1.2146 | 0.1944 |
| 2.6073 | 3.1001 | 240000 | 8.3282 | 2.1196 | 0.0032 | 57.0281 | 4.0435 | 1.2146 | 0.1944 |
| 2.6069 | 3.1097 | 245000 | 8.3028 | 2.1166 | 0.0032 | 56.6777 | 4.0374 | 1.2146 | 0.1944 |
| 2.5963 | 3.1192 | 250000 | 8.2717 | 2.1128 | 0.0032 | 56.2549 | 4.0299 | 1.2146 | 0.1944 |
| 2.5939 | 4.0095 | 255000 | 8.2485 | 2.1100 | 0.0032 | 55.9397 | 4.0243 | 1.2146 | 0.1944 |
| 2.6052 | 4.0191 | 260000 | 8.2190 | 2.1065 | 0.0032 | 55.5436 | 4.0172 | 1.2146 | 0.1944 |
| 2.6206 | 4.0286 | 265000 | 8.1968 | 2.1037 | 0.0032 | 55.2421 | 4.0117 | 1.2146 | 0.1944 |
| 2.5901 | 4.0381 | 270000 | 8.1721 | 2.1007 | 0.0032 | 54.9110 | 4.0057 | 1.2146 | 0.1944 |
| 2.6189 | 4.0477 | 275000 | 8.1531 | 2.0984 | 0.0032 | 54.6544 | 4.0010 | 1.2146 | 0.1944 |
| 2.6123 | 4.0572 | 280000 | 8.1302 | 2.0956 | 0.0032 | 54.3509 | 3.9955 | 1.2146 | 0.1944 |
| 2.5939 | 4.0668 | 285000 | 8.1105 | 2.0932 | 0.0032 | 54.0872 | 3.9906 | 1.2146 | 0.1944 |
| 2.5931 | 4.0763 | 290000 | 8.0902 | 2.0907 | 0.0032 | 53.8164 | 3.9856 | 1.2146 | 0.1944 |
| 2.5801 | 4.0858 | 295000 | 8.0712 | 2.0883 | 0.0032 | 53.5642 | 3.9809 | 1.2146 | 0.1944 |
| 2.5703 | 4.0954 | 300000 | 8.0520 | 2.0859 | 0.0032 | 53.3124 | 3.9762 | 1.2146 | 0.1944 |
| 2.5951 | 4.1049 | 305000 | 8.0361 | 2.0839 | 0.0032 | 53.1002 | 3.9722 | 1.2146 | 0.1944 |
| 2.5706 | 4.1144 | 310000 | 8.0240 | 2.0824 | 0.0032 | 52.9377 | 3.9691 | 1.2146 | 0.1944 |
| 2.5721 | 5.0048 | 315000 | 8.0057 | 2.0802 | 0.0032 | 52.6975 | 3.9646 | 1.2146 | 0.1944 |
| 2.5681 | 5.0143 | 320000 | 7.9894 | 2.0781 | 0.0032 | 52.4840 | 3.9605 | 1.2146 | 0.1944 |
| 2.5713 | 5.0238 | 325000 | 7.9746 | 2.0763 | 0.0032 | 52.2925 | 3.9569 | 1.2146 | 0.1944 |
| 2.5678 | 5.0334 | 330000 | 7.9619 | 2.0747 | 0.0032 | 52.1243 | 3.9536 | 1.2146 | 0.1944 |
| 2.5759 | 5.0429 | 335000 | 7.9467 | 2.0728 | 0.0032 | 51.9275 | 3.9498 | 1.2146 | 0.1944 |
| 2.5734 | 5.0525 | 340000 | 7.9344 | 2.0712 | 0.0032 | 51.7658 | 3.9467 | 1.2146 | 0.1944 |
| 2.5723 | 5.0620 | 345000 | 7.9246 | 2.0700 | 0.0032 | 51.6367 | 3.9442 | 1.2146 | 0.1944 |
| 2.5716 | 5.0715 | 350000 | 7.9128 | 2.0685 | 0.0032 | 51.4820 | 3.9412 | 1.2146 | 0.1944 |
| 2.5788 | 5.0811 | 355000 | 7.9013 | 2.0670 | 0.0032 | 51.3336 | 3.9383 | 1.2146 | 0.1944 |
| 2.5638 | 5.0906 | 360000 | 7.8881 | 2.0654 | 0.0032 | 51.1649 | 3.9351 | 1.2146 | 0.1944 |
| 2.5657 | 5.1001 | 365000 | 7.8774 | 2.0640 | 0.0032 | 51.0266 | 3.9323 | 1.2146 | 0.1944 |
| 2.5697 | 5.1097 | 370000 | 7.8698 | 2.0630 | 0.0032 | 50.9255 | 3.9304 | 1.2146 | 0.1944 |
| 2.5598 | 5.1192 | 375000 | 7.8584 | 2.0616 | 0.0032 | 50.7796 | 3.9275 | 1.2146 | 0.1944 |
| 2.5571 | 6.0095 | 380000 | 7.8515 | 2.0607 | 0.0032 | 50.6889 | 3.9257 | 1.2146 | 0.1944 |
| 2.5717 | 6.0191 | 385000 | 7.8408 | 2.0593 | 0.0032 | 50.5542 | 3.9230 | 1.2146 | 0.1944 |
| 2.5886 | 6.0286 | 390000 | 7.8331 | 2.0584 | 0.0032 | 50.4543 | 3.9211 | 1.2146 | 0.1944 |
| 2.5579 | 6.0381 | 395000 | 7.8248 | 2.0573 | 0.0032 | 50.3467 | 3.9189 | 1.2146 | 0.1944 |
| 2.5885 | 6.0477 | 400000 | 7.8188 | 2.0565 | 0.0032 | 50.2690 | 3.9174 | 1.2146 | 0.1944 |
| 2.584 | 6.0572 | 405000 | 7.8107 | 2.0555 | 0.0032 | 50.1654 | 3.9153 | 1.2146 | 0.1944 |
| 2.5663 | 6.0668 | 410000 | 7.8039 | 2.0546 | 0.0032 | 50.0788 | 3.9136 | 1.2146 | 0.1944 |
| 2.5658 | 6.0763 | 415000 | 7.7975 | 2.0538 | 0.0032 | 49.9948 | 3.9119 | 1.2146 | 0.1944 |
| 2.5549 | 6.0858 | 420000 | 7.7909 | 2.0530 | 0.0032 | 49.9114 | 3.9102 | 1.2146 | 0.1944 |
| 2.5445 | 6.0954 | 425000 | 7.7842 | 2.0521 | 0.0032 | 49.8271 | 3.9086 | 1.2146 | 0.1944 |
| 2.5732 | 6.1049 | 430000 | 7.7799 | 2.0515 | 0.0032 | 49.7709 | 3.9074 | 1.2146 | 0.1944 |
| 2.5483 | 6.1144 | 435000 | 7.7769 | 2.0512 | 0.0032 | 49.7301 | 3.9066 | 1.2146 | 0.1944 |
| 2.5494 | 7.0048 | 440000 | 7.7702 | 2.0503 | 0.0032 | 49.6461 | 3.9049 | 1.2146 | 0.1944 |
| 2.5482 | 7.0143 | 445000 | 7.7655 | 2.0497 | 0.0032 | 49.5863 | 3.9037 | 1.2146 | 0.1944 |
| 2.5514 | 7.0238 | 450000 | 7.7611 | 2.0491 | 0.0032 | 49.5319 | 3.9026 | 1.2146 | 0.1944 |
| 2.549 | 7.0334 | 455000 | 7.7576 | 2.0487 | 0.0032 | 49.4864 | 3.9017 | 1.2146 | 0.1944 |
| 2.5567 | 7.0429 | 460000 | 7.7537 | 2.0482 | 0.0032 | 49.4372 | 3.9007 | 1.2146 | 0.1944 |
| 2.5555 | 7.0525 | 465000 | 7.7504 | 2.0477 | 0.0032 | 49.3947 | 3.8998 | 1.2146 | 0.1944 |
| 2.5564 | 7.0620 | 470000 | 7.7482 | 2.0475 | 0.0032 | 49.3660 | 3.8993 | 1.2146 | 0.1944 |
| 2.5542 | 7.0715 | 475000 | 7.7453 | 2.0471 | 0.0032 | 49.3283 | 3.8985 | 1.2146 | 0.1944 |
| 2.5627 | 7.0811 | 480000 | 7.7427 | 2.0468 | 0.0032 | 49.2958 | 3.8978 | 1.2146 | 0.1944 |
| 2.5511 | 7.0906 | 485000 | 7.7396 | 2.0463 | 0.0032 | 49.2562 | 3.8970 | 1.2146 | 0.1944 |
| 2.5533 | 7.1001 | 490000 | 7.7372 | 2.0460 | 0.0032 | 49.2271 | 3.8964 | 1.2146 | 0.1944 |
| 2.557 | 7.1097 | 495000 | 7.7363 | 2.0459 | 0.0032 | 49.2147 | 3.8962 | 1.2146 | 0.1944 |
| 2.5467 | 7.1192 | 500000 | 7.7347 | 2.0457 | 0.0032 | 49.1938 | 3.8958 | 1.2146 | 0.1944 |
| 2.5482 | 8.0095 | 505000 | 7.7332 | 2.0455 | 0.0032 | 49.1749 | 3.8954 | 1.2146 | 0.1944 |
| 2.5599 | 8.0191 | 510000 | 7.7323 | 2.0454 | 0.0032 | 49.1640 | 3.8952 | 1.2146 | 0.1944 |
| 2.5788 | 8.0286 | 515000 | 7.7318 | 2.0453 | 0.0032 | 49.1576 | 3.8950 | 1.2146 | 0.1944 |
| 2.5479 | 8.0381 | 520000 | 7.7314 | 2.0453 | 0.0032 | 49.1523 | 3.8949 | 1.2146 | 0.1944 |
Framework versions
- Transformers 4.57.0.dev0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 459
Model tree for hrezaei/flan-t5la-small
Unable to build the model tree, the base model loops to the model itself. Learn more.
Dataset used to train hrezaei/flan-t5la-small
Evaluation results
- Accuracy on HuggingFaceFW/fineweb sample-350BTself-reported0.003