Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

flan-t5la-small

This model is a fine-tuned version of hrezaei/flan-t5la-small on the HuggingFaceFW/fineweb sample-350BT dataset. It achieves the following results on the evaluation set:

  • Perplexity: 7.7313
  • Loss: 2.0453
  • Accuracy: 0.0032
  • Lookahead Perplexity: 49.1518
  • Lookahead Loss: 3.8949
  • Base Perplexity: 1.2146
  • Base Loss: 0.1944

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • training_steps: 524288

Training results

Training Loss Epoch Step Perplexity Validation Loss Accuracy Lookahead Perplexity Lookahead Loss Base Perplexity Base Loss
4.8052 0.0095 5000 250.1917 5.5222 0.0032 51687.8881 10.8530 1.2146 0.1944
4.1668 0.0191 10000 88.2828 4.4805 0.0032 6430.9941 8.7689 1.2146 0.1944
3.7641 0.0286 15000 48.8034 3.8878 0.0032 1966.0577 7.5838 1.2146 0.1944
3.5174 0.0381 20000 33.9964 3.5263 0.0032 953.9463 6.8606 1.2146 0.1944
3.3815 0.0477 25000 26.7476 3.2864 0.0032 590.2392 6.3805 1.2146 0.1944
3.2612 0.0572 30000 22.4741 3.1124 0.0032 416.4622 6.0318 1.2146 0.1944
3.1695 0.0668 35000 19.6661 2.9789 0.0032 318.7243 5.7643 1.2146 0.1944
3.1099 0.0763 40000 17.6862 2.8728 0.0032 257.6626 5.5517 1.2146 0.1944
3.0454 0.0858 45000 16.2331 2.7871 0.0032 216.9703 5.3798 1.2146 0.1944
2.9932 0.0954 50000 15.1058 2.7151 0.0032 187.8344 5.2356 1.2146 0.1944
2.9685 0.1049 55000 14.2347 2.6557 0.0032 166.7505 5.1165 1.2146 0.1944
2.9146 0.1144 60000 13.5266 2.6047 0.0032 150.5264 5.0141 1.2146 0.1944
2.8939 1.0048 65000 12.9238 2.5591 0.0032 137.3885 4.9228 1.2146 0.1944
2.8667 1.0143 70000 12.4200 2.5193 0.0032 126.8708 4.8432 1.2146 0.1944
2.8495 1.0238 75000 11.9951 2.4845 0.0032 118.3322 4.7735 1.2146 0.1944
2.8226 1.0334 80000 11.6344 2.4540 0.0032 111.3047 4.7123 1.2146 0.1944
2.8131 1.0429 85000 11.3094 2.4256 0.0032 105.1683 4.6556 1.2146 0.1944
2.7982 1.0525 90000 11.0316 2.4008 0.0032 100.0606 4.6058 1.2146 0.1944
2.7809 1.0620 95000 10.7938 2.3790 0.0032 95.7859 4.5621 1.2146 0.1944
2.7659 1.0715 100000 10.5758 2.3586 0.0032 91.9492 4.5212 1.2146 0.1944
2.7617 1.0811 105000 10.3792 2.3398 0.0032 88.5628 4.4837 1.2146 0.1944
2.7458 1.0906 110000 10.1936 2.3218 0.0032 85.4272 4.4477 1.2146 0.1944
2.7383 1.1001 115000 10.0342 2.3060 0.0032 82.7731 4.4161 1.2146 0.1944
2.7269 1.1097 120000 9.8958 2.2921 0.0032 80.4986 4.3882 1.2146 0.1944
2.7131 1.1192 125000 9.7566 2.2779 0.0032 78.2504 4.3599 1.2146 0.1944
2.6989 2.0095 130000 9.6413 2.2661 0.0032 76.4115 4.3361 1.2146 0.1944
2.7099 2.0191 135000 9.5230 2.2537 0.0032 74.5514 4.3115 1.2146 0.1944
2.7143 2.0286 140000 9.4245 2.2433 0.0032 73.0154 4.2907 1.2146 0.1944
2.682 2.0381 145000 9.3258 2.2328 0.0032 71.4968 4.2697 1.2146 0.1944
2.7051 2.0477 150000 9.2432 2.2239 0.0032 70.2333 4.2518 1.2146 0.1944
2.6935 2.0572 155000 9.1599 2.2148 0.0032 68.9760 4.2338 1.2146 0.1944
2.6733 2.0668 160000 9.0837 2.2065 0.0032 67.8339 4.2171 1.2146 0.1944
2.6706 2.0763 165000 9.0112 2.1985 0.0032 66.7535 4.2010 1.2146 0.1944
2.6538 2.0858 170000 8.9443 2.1910 0.0032 65.7671 4.1861 1.2146 0.1944
2.6397 2.0954 175000 8.8802 2.1838 0.0032 64.8308 4.1718 1.2146 0.1944
2.6608 2.1049 180000 8.8237 2.1774 0.0032 64.0079 4.1590 1.2146 0.1944
2.6339 2.1144 185000 8.7756 2.1720 0.0032 63.3068 4.1480 1.2146 0.1944
2.6327 3.0048 190000 8.7198 2.1656 0.0032 62.5058 4.1353 1.2146 0.1944
2.6281 3.0143 195000 8.6693 2.1598 0.0032 61.7858 4.1237 1.2146 0.1944
2.6281 3.0238 200000 8.6218 2.1543 0.0032 61.1157 4.1128 1.2146 0.1944
2.6206 3.0334 205000 8.5813 2.1496 0.0032 60.5393 4.1033 1.2146 0.1944
2.6253 3.0429 210000 8.5365 2.1443 0.0032 59.9116 4.0929 1.2146 0.1944
2.6222 3.0525 215000 8.4976 2.1398 0.0032 59.3662 4.0837 1.2146 0.1944
2.619 3.0620 220000 8.4646 2.1359 0.0032 58.9064 4.0759 1.2146 0.1944
2.6154 3.0715 225000 8.4291 2.1317 0.0032 58.4114 4.0675 1.2146 0.1944
2.6194 3.0811 230000 8.3955 2.1277 0.0032 57.9497 4.0596 1.2146 0.1944
2.6071 3.0906 235000 8.3591 2.1233 0.0032 57.4509 4.0509 1.2146 0.1944
2.6073 3.1001 240000 8.3282 2.1196 0.0032 57.0281 4.0435 1.2146 0.1944
2.6069 3.1097 245000 8.3028 2.1166 0.0032 56.6777 4.0374 1.2146 0.1944
2.5963 3.1192 250000 8.2717 2.1128 0.0032 56.2549 4.0299 1.2146 0.1944
2.5939 4.0095 255000 8.2485 2.1100 0.0032 55.9397 4.0243 1.2146 0.1944
2.6052 4.0191 260000 8.2190 2.1065 0.0032 55.5436 4.0172 1.2146 0.1944
2.6206 4.0286 265000 8.1968 2.1037 0.0032 55.2421 4.0117 1.2146 0.1944
2.5901 4.0381 270000 8.1721 2.1007 0.0032 54.9110 4.0057 1.2146 0.1944
2.6189 4.0477 275000 8.1531 2.0984 0.0032 54.6544 4.0010 1.2146 0.1944
2.6123 4.0572 280000 8.1302 2.0956 0.0032 54.3509 3.9955 1.2146 0.1944
2.5939 4.0668 285000 8.1105 2.0932 0.0032 54.0872 3.9906 1.2146 0.1944
2.5931 4.0763 290000 8.0902 2.0907 0.0032 53.8164 3.9856 1.2146 0.1944
2.5801 4.0858 295000 8.0712 2.0883 0.0032 53.5642 3.9809 1.2146 0.1944
2.5703 4.0954 300000 8.0520 2.0859 0.0032 53.3124 3.9762 1.2146 0.1944
2.5951 4.1049 305000 8.0361 2.0839 0.0032 53.1002 3.9722 1.2146 0.1944
2.5706 4.1144 310000 8.0240 2.0824 0.0032 52.9377 3.9691 1.2146 0.1944
2.5721 5.0048 315000 8.0057 2.0802 0.0032 52.6975 3.9646 1.2146 0.1944
2.5681 5.0143 320000 7.9894 2.0781 0.0032 52.4840 3.9605 1.2146 0.1944
2.5713 5.0238 325000 7.9746 2.0763 0.0032 52.2925 3.9569 1.2146 0.1944
2.5678 5.0334 330000 7.9619 2.0747 0.0032 52.1243 3.9536 1.2146 0.1944
2.5759 5.0429 335000 7.9467 2.0728 0.0032 51.9275 3.9498 1.2146 0.1944
2.5734 5.0525 340000 7.9344 2.0712 0.0032 51.7658 3.9467 1.2146 0.1944
2.5723 5.0620 345000 7.9246 2.0700 0.0032 51.6367 3.9442 1.2146 0.1944
2.5716 5.0715 350000 7.9128 2.0685 0.0032 51.4820 3.9412 1.2146 0.1944
2.5788 5.0811 355000 7.9013 2.0670 0.0032 51.3336 3.9383 1.2146 0.1944
2.5638 5.0906 360000 7.8881 2.0654 0.0032 51.1649 3.9351 1.2146 0.1944
2.5657 5.1001 365000 7.8774 2.0640 0.0032 51.0266 3.9323 1.2146 0.1944
2.5697 5.1097 370000 7.8698 2.0630 0.0032 50.9255 3.9304 1.2146 0.1944
2.5598 5.1192 375000 7.8584 2.0616 0.0032 50.7796 3.9275 1.2146 0.1944
2.5571 6.0095 380000 7.8515 2.0607 0.0032 50.6889 3.9257 1.2146 0.1944
2.5717 6.0191 385000 7.8408 2.0593 0.0032 50.5542 3.9230 1.2146 0.1944
2.5886 6.0286 390000 7.8331 2.0584 0.0032 50.4543 3.9211 1.2146 0.1944
2.5579 6.0381 395000 7.8248 2.0573 0.0032 50.3467 3.9189 1.2146 0.1944
2.5885 6.0477 400000 7.8188 2.0565 0.0032 50.2690 3.9174 1.2146 0.1944
2.584 6.0572 405000 7.8107 2.0555 0.0032 50.1654 3.9153 1.2146 0.1944
2.5663 6.0668 410000 7.8039 2.0546 0.0032 50.0788 3.9136 1.2146 0.1944
2.5658 6.0763 415000 7.7975 2.0538 0.0032 49.9948 3.9119 1.2146 0.1944
2.5549 6.0858 420000 7.7909 2.0530 0.0032 49.9114 3.9102 1.2146 0.1944
2.5445 6.0954 425000 7.7842 2.0521 0.0032 49.8271 3.9086 1.2146 0.1944
2.5732 6.1049 430000 7.7799 2.0515 0.0032 49.7709 3.9074 1.2146 0.1944
2.5483 6.1144 435000 7.7769 2.0512 0.0032 49.7301 3.9066 1.2146 0.1944
2.5494 7.0048 440000 7.7702 2.0503 0.0032 49.6461 3.9049 1.2146 0.1944
2.5482 7.0143 445000 7.7655 2.0497 0.0032 49.5863 3.9037 1.2146 0.1944
2.5514 7.0238 450000 7.7611 2.0491 0.0032 49.5319 3.9026 1.2146 0.1944
2.549 7.0334 455000 7.7576 2.0487 0.0032 49.4864 3.9017 1.2146 0.1944
2.5567 7.0429 460000 7.7537 2.0482 0.0032 49.4372 3.9007 1.2146 0.1944
2.5555 7.0525 465000 7.7504 2.0477 0.0032 49.3947 3.8998 1.2146 0.1944
2.5564 7.0620 470000 7.7482 2.0475 0.0032 49.3660 3.8993 1.2146 0.1944
2.5542 7.0715 475000 7.7453 2.0471 0.0032 49.3283 3.8985 1.2146 0.1944
2.5627 7.0811 480000 7.7427 2.0468 0.0032 49.2958 3.8978 1.2146 0.1944
2.5511 7.0906 485000 7.7396 2.0463 0.0032 49.2562 3.8970 1.2146 0.1944
2.5533 7.1001 490000 7.7372 2.0460 0.0032 49.2271 3.8964 1.2146 0.1944
2.557 7.1097 495000 7.7363 2.0459 0.0032 49.2147 3.8962 1.2146 0.1944
2.5467 7.1192 500000 7.7347 2.0457 0.0032 49.1938 3.8958 1.2146 0.1944
2.5482 8.0095 505000 7.7332 2.0455 0.0032 49.1749 3.8954 1.2146 0.1944
2.5599 8.0191 510000 7.7323 2.0454 0.0032 49.1640 3.8952 1.2146 0.1944
2.5788 8.0286 515000 7.7318 2.0453 0.0032 49.1576 3.8950 1.2146 0.1944
2.5479 8.0381 520000 7.7314 2.0453 0.0032 49.1523 3.8949 1.2146 0.1944

Framework versions

  • Transformers 4.57.0.dev0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
459
Safetensors
Model size
93.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hrezaei/flan-t5la-small

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train hrezaei/flan-t5la-small

Evaluation results