mt5-base-b16-e16-t126k-jupyter

This model is a fine-tuned version of google/mt5-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1645
  • Rouge1: 69.0491
  • Rouge2: 62.3577
  • Rougel: 68.9223
  • Rougelsum: 68.9293
  • Gen Len: 19.1285

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 16

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.4664 0.1269 1000 0.3503 58.6102 49.1959 58.3363 58.338 19.2724
0.4085 0.2538 2000 0.2657 64.9195 55.7744 64.6518 64.6597 19.0740
0.332 0.3808 3000 0.2551 67.2391 58.6115 67.0285 67.0369 19.1507
0.2964 0.5077 4000 0.2240 59.4787 51.7894 59.3096 59.3014 18.8110
0.2776 0.6346 5000 0.2101 48.75 41.3868 48.5559 48.5627 17.7630
0.2596 0.7615 6000 0.2014 64.9022 56.7611 64.7122 64.7321 19.1331
0.2497 0.8884 7000 0.1960 57.5409 50.5573 57.3506 57.3713 19.2094
0.2279 1.0154 8000 0.1953 65.7851 58.3685 65.6045 65.6017 19.1360
0.1745 1.1423 9000 0.1945 69.9367 62.548 69.7904 69.7943 19.2043
0.1762 1.2692 10000 0.1836 64.1157 56.773 63.9375 63.9338 18.8669
0.1692 1.3961 11000 0.1802 68.431 61.0582 68.2591 68.2677 19.2098
0.1678 1.5230 12000 0.1816 67.5347 60.4008 67.3811 67.385 19.2364
0.1691 1.6500 13000 0.1754 65.9637 58.9869 65.8222 65.8294 19.1470
0.1656 1.7769 14000 0.1716 55.9545 49.1856 55.7837 55.7761 18.9288
0.1623 1.9038 15000 0.1668 65.4847 58.5782 65.3217 65.3235 19.2038
0.1469 2.0307 16000 0.1714 65.3287 58.5385 65.1742 65.1803 19.0559
0.1133 2.1576 17000 0.1721 66.502 59.67 66.3759 66.3848 19.0530
0.114 2.2846 18000 0.1787 66.0224 59.2078 65.8882 65.8981 19.1717
0.114 2.4115 19000 0.1725 60.8491 54.101 60.696 60.6942 18.9738
0.1185 2.5384 20000 0.1691 57.2166 50.3259 57.0389 57.0385 18.5442
0.1165 2.6653 21000 0.1677 68.1682 61.5828 68.0211 68.0217 19.2470
0.1173 2.7922 22000 0.1674 69.019 62.4579 68.9009 68.8989 19.1336
0.1156 2.9192 23000 0.1645 69.0491 62.3577 68.9223 68.9293 19.1285
0.1026 3.0461 24000 0.1725 68.3381 61.8488 68.2171 68.2209 18.9749
0.0805 3.1730 25000 0.1746 69.2018 62.7444 69.0733 69.0814 19.1173
0.0813 3.2999 26000 0.1786 68.0695 61.5598 67.9424 67.9475 19.1217
0.0821 3.4268 27000 0.1729 71.4822 65.1469 71.3625 71.3711 19.1623
0.0836 3.5538 28000 0.1787 69.384 62.875 69.2804 69.2795 19.1467
0.0868 3.6807 29000 0.1707 69.9597 63.5038 69.8371 69.8369 19.2096
0.086 3.8076 30000 0.1712 64.6143 58.195 64.4892 64.4989 18.9053
0.085 3.9345 31000 0.1734 64.0262 57.7604 63.8999 63.8955 18.8091
0.0722 4.0614 32000 0.1815 64.4249 58.0731 64.3094 64.3093 18.8705
0.0577 4.1883 33000 0.1827 68.8388 62.3576 68.7079 68.7167 19.1096
0.06 4.3153 34000 0.1829 68.9629 62.5964 68.8241 68.8355 19.1820
0.0628 4.4422 35000 0.1841 67.0498 60.5408 66.8984 66.9192 19.0666
0.0636 4.5691 36000 0.1846 70.3454 63.9895 70.2318 70.2356 19.1360
0.0648 4.6960 37000 0.1783 69.7467 63.4059 69.6243 69.6299 19.1450
0.0642 4.8229 38000 0.1791 68.374 61.7525 68.2523 68.2543 19.0701
0.0659 4.9499 39000 0.1818 68.5635 62.1515 68.4405 68.4462 19.1936
0.0526 5.0768 40000 0.1974 71.1718 64.9477 71.0566 71.0651 19.1645
0.0444 5.2037 41000 0.1865 69.6394 63.2955 69.521 69.5322 19.1499
0.0449 5.3306 42000 0.1904 70.1591 63.6933 70.0387 70.0529 19.1812
0.0457 5.4575 43000 0.1912 71.1048 64.8898 70.9968 71.0021 19.1706
0.0489 5.5845 44000 0.1910 70.9571 64.6543 70.8418 70.846 19.1642
0.0477 5.7114 45000 0.1995 71.1174 64.8599 71.0092 71.0157 19.1676
0.0499 5.8383 46000 0.1883 67.9455 61.5466 67.8268 67.8291 19.0710
0.0489 5.9652 47000 0.1943 69.2924 62.8611 69.1755 69.1852 19.1791
0.0362 6.0921 48000 0.2003 69.6819 63.4608 69.5671 69.5726 19.1258
0.0334 6.2191 49000 0.1999 70.229 63.9271 70.1131 70.1184 19.2295
0.0339 6.3460 50000 0.2004 69.5581 63.2151 69.4493 69.4556 19.2695
0.0357 6.4729 51000 0.2045 69.9536 63.6877 69.827 69.8373 19.1732
0.0362 6.5998 52000 0.2005 68.0243 61.7015 67.9114 67.9188 19.0782
0.0371 6.7267 53000 0.2027 68.7199 62.41 68.6001 68.611 19.1258
0.0376 6.8537 54000 0.2018 66.4473 60.0452 66.3113 66.3162 19.0988
0.0378 6.9806 55000 0.2056 68.6484 62.3742 68.5344 68.5377 19.1435
0.0266 7.1075 56000 0.2195 67.8788 61.538 67.7618 67.7651 19.1034
0.0243 7.2344 57000 0.2179 67.1296 60.7251 67.0162 67.0143 19.0743
0.0264 7.3613 58000 0.2173 66.7227 60.5002 66.6089 66.6165 19.1236
0.0272 7.4883 59000 0.2102 71.1275 65.0313 71.0101 71.0243 19.1706
0.0274 7.6152 60000 0.2163 70.459 64.334 70.3602 70.3636 19.1725
0.0288 7.7421 61000 0.2036 67.5839 61.1639 67.4641 67.4691 19.0064
0.0272 7.8690 62000 0.2085 68.9624 62.7038 68.8432 68.8504 19.1775
0.0291 7.9959 63000 0.2091 67.1602 60.7383 67.0253 67.0379 19.1666
0.0192 8.1229 64000 0.2261 70.048 63.9186 69.9255 69.9291 19.1684
0.0194 8.2498 65000 0.2299 69.2928 63.116 69.1601 69.1736 19.1384
0.0198 8.3767 66000 0.2269 67.6192 61.2499 67.4866 67.492 19.2088
0.0198 8.5036 67000 0.2255 69.7527 63.6416 69.6346 69.6453 19.2061
0.021 8.6305 68000 0.2188 68.2419 62.0352 68.1225 68.1289 19.1070
0.0212 8.7575 69000 0.2180 64.683 58.6031 64.5447 64.5551 18.8629
0.021 8.8844 70000 0.2237 64.7196 58.4031 64.5935 64.6065 18.9414
0.0207 9.0113 71000 0.2286 70.1433 64.1181 70.0359 70.0399 19.1467
0.0144 9.1382 72000 0.2381 67.8838 61.7006 67.7642 67.7786 19.1517
0.0144 9.2651 73000 0.2325 70.1472 64.1045 70.0423 70.0431 19.1717
0.015 9.3921 74000 0.2360 71.3244 65.4014 71.2154 71.2262 19.1433
0.0149 9.5190 75000 0.2410 71.2676 65.3896 71.1525 71.1656 19.1706
0.016 9.6459 76000 0.2351 72.1784 66.3816 72.0702 72.0816 19.1549
0.015 9.7728 77000 0.2380 72.3992 66.5645 72.292 72.3048 19.1705
0.016 9.8997 78000 0.2341 71.0815 65.205 70.986 70.9874 19.1701
0.0144 10.0267 79000 0.2461 70.1223 64.1529 70.0051 70.0088 19.1602
0.0109 10.1536 80000 0.2413 68.6412 62.6327 68.5089 68.5152 19.1642
0.0109 10.2805 81000 0.2531 69.2718 63.1213 69.1523 69.1603 19.1552
0.0113 10.4074 82000 0.2509 69.3696 63.2617 69.2464 69.2622 19.1938
0.0113 10.5343 83000 0.2437 69.2457 63.178 69.1322 69.145 19.1876
0.0109 10.6613 84000 0.2579 69.9983 63.9436 69.8858 69.893 19.1903
0.0114 10.7882 85000 0.2522 68.8232 62.5196 68.7168 68.7286 19.1273
0.0115 10.9151 86000 0.2498 67.6943 61.5001 67.5733 67.5787 19.1484
0.0099 11.0420 87000 0.2649 68.5861 62.4139 68.4787 68.4842 19.1288
0.0078 11.1689 88000 0.2610 67.6684 61.4378 67.5605 67.5663 19.1644
0.008 11.2958 89000 0.2627 69.2172 63.0948 69.1101 69.1167 19.2093
0.0078 11.4228 90000 0.2670 67.9655 61.7532 67.8555 67.8537 19.1457
0.0081 11.5497 91000 0.2627 68.6292 62.4361 68.51 68.5225 19.2017
0.0081 11.6766 92000 0.2675 67.5707 61.4739 67.4536 67.4645 19.1873
0.0081 11.8035 93000 0.2661 67.8674 61.695 67.74 67.7535 19.1516
0.0084 11.9304 94000 0.2678 66.7531 60.6613 66.6401 66.6481 19.1218
0.0071 12.0574 95000 0.2767 67.5904 61.4704 67.4677 67.4743 19.1178
0.0063 12.1843 96000 0.2754 66.3677 60.2218 66.2238 66.2309 19.0790
0.0056 12.3112 97000 0.2846 68.9713 62.9631 68.8408 68.8516 19.1473
0.0057 12.4381 98000 0.2899 69.3481 63.4716 69.2312 69.2378 19.1224
0.0058 12.5650 99000 0.2749 68.866 62.9244 68.7422 68.7552 19.1194
0.0058 12.6920 100000 0.2880 66.9218 60.7514 66.8013 66.8022 19.0786
0.0058 12.8189 101000 0.2912 68.2067 62.1356 68.0816 68.0936 19.1021
0.0058 12.9458 102000 0.2832 66.3243 60.1518 66.1979 66.1978 19.0439
0.0048 13.0727 103000 0.3017 66.5581 60.3887 66.4172 66.4245 19.0610
0.0043 13.1996 104000 0.2920 68.1021 61.953 67.9814 67.9924 19.1312
0.0041 13.3266 105000 0.2969 67.1438 60.934 67.0129 67.0261 19.1098
0.0042 13.4535 106000 0.2998 66.0186 59.8272 65.8811 65.8871 19.0740
0.0043 13.5804 107000 0.2962 66.1395 59.9014 66.0011 66.0105 19.0626
0.0039 13.7073 108000 0.3029 67.6976 61.543 67.5765 67.5771 19.1092
0.004 13.8342 109000 0.3041 68.7212 62.6213 68.6135 68.6104 19.1106
0.0039 13.9612 110000 0.3091 68.164 62.0282 68.0419 68.0521 19.1198
0.0032 14.0881 111000 0.3152 67.4907 61.2721 67.364 67.3715 19.0937
0.0029 14.2150 112000 0.3180 67.2118 61.0255 67.0818 67.0911 19.1074
0.0027 14.3419 113000 0.3256 66.6723 60.4405 66.5353 66.5474 19.0571
0.003 14.4688 114000 0.3196 67.9842 61.7442 67.8502 67.8586 19.1207
0.0028 14.5958 115000 0.3280 68.1337 61.9552 67.9942 68.0032 19.1020
0.0029 14.7227 116000 0.3264 66.8958 60.6476 66.7586 66.7686 19.0682
0.0029 14.8496 117000 0.3232 67.7823 61.5145 67.6528 67.6575 19.1225
0.0031 14.9765 118000 0.3230 67.9894 61.7854 67.8634 67.872 19.1056
0.0025 15.1034 119000 0.3280 68.0723 61.87 67.951 67.9591 19.1074
0.0022 15.2304 120000 0.3352 67.3952 61.2142 67.2731 67.2772 19.0851
0.0024 15.3573 121000 0.3319 67.2694 61.0931 67.1442 67.1523 19.0729
0.002 15.4842 122000 0.3348 67.6768 61.5274 67.5549 67.5622 19.0860
0.0022 15.6111 123000 0.3350 67.8076 61.6275 67.6806 67.687 19.0935
0.0021 15.7380 124000 0.3351 67.8397 61.6565 67.7097 67.7203 19.1006
0.002 15.8650 125000 0.3353 67.8209 61.6651 67.6884 67.6941 19.0988
0.0021 15.9919 126000 0.3358 67.7802 61.6191 67.6505 67.6599 19.0989

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.1
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fresst/mt5-base-b16-e16-t126k-jupyter

Base model

google/mt5-base
Finetuned
(301)
this model