byt5-base-b16-e16-126k-jupyter

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0731
  • Rouge1: 29.055
  • Rouge2: 18.8448
  • Rougel: 28.9976
  • Rougelsum: 29.001
  • Gen Len: 19.9828

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Use OptimizerNames.ADAFACTOR and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 16

Training results

Training Loss Epoch Step Gen Len Validation Loss Rouge1 Rouge2 Rougel Rougelsum
0.143 0.1269 1000 19.9884 0.1352 27.1289 16.326 26.9924 26.9924
0.1374 0.2538 2000 19.9834 0.1207 27.4989 16.7093 27.3352 27.3345
0.1262 0.3808 3000 19.9863 0.1125 27.6424 16.969 27.4981 27.5007
0.1194 0.5077 4000 19.9834 0.1064 27.7983 17.2238 27.6817 27.6785
0.116 0.6346 5000 19.9843 0.1041 27.9324 17.3766 27.8127 27.8122
0.111 0.7615 6000 19.9840 0.1010 27.9527 17.4353 27.8598 27.8584
0.1078 0.8884 7000 19.9866 0.1009 28.1459 17.6823 28.0504 28.0511
0.103 1.0154 8000 19.9920 0.0941 28.2145 17.8042 28.1241 28.1247
0.0923 1.1423 9000 19.9831 0.0942 28.3315 17.9205 28.2569 28.2592
0.0933 1.2692 10000 19.9839 0.0903 28.4024 17.9947 28.3122 28.3171
0.0895 1.3961 11000 19.9802 0.0896 28.3964 18.0166 28.3175 28.3144
0.0893 1.5230 12000 19.9863 0.0891 28.4398 18.0797 28.3516 28.3539
0.0896 1.6500 13000 19.9861 0.0870 28.5396 18.1916 28.4564 28.4554
0.0884 1.7769 14000 19.9849 0.0862 28.597 18.2832 28.5238 28.5217
0.0879 1.9038 15000 19.9831 0.0829 28.5972 18.2652 28.5092 28.5094
0.0824 2.0307 16000 19.9833 0.0838 28.5972 18.2937 28.5196 28.5196
0.0747 2.1576 17000 19.9820 0.0828 28.693 18.3776 28.6099 28.6083
0.0749 2.2846 18000 19.9834 0.0815 28.7271 18.4627 28.6437 28.6485
0.0743 2.4115 19000 19.9831 0.0813 28.7775 18.4964 28.6982 28.7005
0.0757 2.5384 20000 19.9839 0.0801 28.7398 18.4692 28.658 28.6631
0.0735 2.6653 21000 19.9832 0.0784 28.83 18.5523 28.7598 28.7605
0.0744 2.7922 22000 19.9868 0.0776 28.8767 18.6342 28.8058 28.8121
0.0734 2.9192 23000 19.9839 0.0763 28.8778 18.6292 28.8003 28.799
0.0686 3.0461 24000 19.9824 0.0781 28.8677 18.6348 28.8082 28.809
0.0616 3.1730 25000 19.9840 0.0779 28.8648 18.6322 28.7984 28.7965
0.0625 3.2999 26000 19.9835 0.0779 28.9743 18.7674 28.9072 28.9049
0.0627 3.4268 27000 19.9864 0.0760 28.9522 18.7617 28.8819 28.8857
0.0634 3.5538 28000 19.9824 0.0767 28.9535 18.7104 28.8768 28.8762
0.0632 3.6807 29000 19.9828 0.0743 29.0383 18.8419 28.9707 28.9702
0.0633 3.8076 30000 19.9827 0.0741 28.9955 18.8184 28.9381 28.9385
0.0633 3.9345 31000 19.9828 0.0731 29.055 18.8448 28.9976 29.001
0.056 4.0614 32000 19.9828 0.0756 29.0131 18.8287 28.9536 28.9531
0.0507 4.1883 33000 19.9838 0.0768 29.0413 18.8688 28.9795 28.9781
0.052 4.3153 34000 19.9827 0.0768 29.0136 18.8229 28.9491 28.9524
0.0533 4.4422 35000 19.9840 0.0750 29.1284 18.9655 29.0704 29.0699
0.0539 4.5691 36000 19.9823 0.0739 29.1427 18.9707 29.0868 29.0842
0.0534 4.6960 37000 19.9823 0.0744 29.1282 18.942 29.0638 29.0613
0.0537 4.8229 38000 19.9834 0.0743 29.1405 18.9703 29.081 29.0848
0.0531 4.9499 39000 19.9827 0.0733 29.1755 19.0099 29.1112 29.1091
0.0466 5.0768 40000 19.9836 0.0801 29.1762 18.9998 29.1105 29.1132
0.0421 5.2037 41000 19.9830 0.0779 29.1067 18.9289 29.0431 29.0431
0.0422 5.3306 42000 19.9835 0.0783 29.0971 18.9297 29.045 29.0417
0.0431 5.4575 43000 19.9830 0.0771 29.2096 19.0405 29.1555 29.1537
0.0444 5.5845 44000 19.9828 0.0764 29.2058 19.0438 29.1506 29.1506
0.0445 5.7114 45000 19.9832 0.0762 29.1971 19.0553 29.1431 29.1453
0.0453 5.8383 46000 19.9832 0.0744 29.2157 19.0608 29.1514 29.1527
0.0456 5.9652 47000 19.9822 0.0754 29.2759 19.1662 29.2201 29.2193
0.0354 6.0921 48000 19.9829 0.0853 29.2266 19.0968 29.1646 29.1658
0.0335 6.2191 49000 19.9830 0.0833 29.2124 19.0662 29.1517 29.1512
0.0343 6.3460 50000 19.9829 0.0814 29.1874 19.0445 29.1303 29.1305
0.035 6.4729 51000 19.9829 0.0795 29.2572 19.1297 29.205 29.2025
0.0363 6.5998 52000 19.9832 0.0804 29.2771 19.1546 29.2125 29.2101
0.036 6.7267 53000 19.9832 0.0793 29.2939 19.1708 29.2295 29.2337
0.0362 6.8537 54000 19.9827 0.0790 29.2848 19.1809 29.228 29.2281
0.0377 6.9806 55000 19.9827 0.0789 29.334 19.2304 29.27 29.2717
0.0271 7.1075 56000 19.9828 0.0886 29.2694 19.1371 29.2094 29.2076
0.0258 7.2344 57000 19.9833 0.0869 29.2911 19.1913 29.2395 29.2376
0.0271 7.3613 58000 19.9830 0.0860 29.296 19.1582 29.2379 29.2382
0.0275 7.4883 59000 19.9828 0.0872 29.2824 19.1659 29.2199 29.2185
0.0284 7.6152 60000 19.9825 0.0878 29.2562 19.1574 29.19 29.1956
0.029 7.7421 61000 19.9826 0.0843 29.2729 19.163 29.2075 29.2086
0.0287 7.8690 62000 19.9832 0.0876 29.2633 19.1554 29.2094 29.21
0.0292 7.9959 63000 19.9832 0.0846 29.3149 19.2081 29.2612 29.2599
0.0193 8.1229 64000 19.9824 0.0990 29.322 19.1889 29.2616 29.2634
0.0197 8.2498 65000 19.9831 0.0956 29.3183 19.1969 29.2607 29.2614
0.0205 8.3767 66000 19.9827 0.0974 29.3071 19.2163 29.2557 29.2523
0.0211 8.5036 67000 19.9832 0.0956 29.3174 19.2197 29.2609 29.2587
0.0217 8.6305 68000 19.9834 0.0949 29.3249 19.2512 29.266 29.2638
0.0215 8.7575 69000 19.9825 0.0944 29.3979 19.3167 29.3489 29.3482
0.0223 8.8844 70000 19.9829 0.0928 29.3568 19.2632 29.2984 29.3008
0.0216 9.0113 71000 19.9835 0.1015 29.364 19.2665 29.3146 29.314
0.0137 9.1382 72000 19.9830 0.1089 29.4083 19.3234 29.3475 29.3477
0.0145 9.2651 73000 19.9826 0.1056 29.3973 19.3188 29.3429 29.3461
0.0153 9.3921 74000 19.9831 0.1039 29.3934 19.2828 29.335 29.3387
0.0155 9.5190 75000 19.9833 0.1042 29.4187 19.3107 29.3607 29.3578
0.0157 9.6459 76000 19.9831 0.1031 29.4138 19.3115 29.3507 29.3539
0.0158 9.7728 77000 19.9823 0.1060 29.3759 19.2998 29.3157 29.3171
0.0164 9.8997 78000 19.9832 0.1015 29.4029 19.3212 29.3448 29.3405
0.0149 10.0267 79000 19.9829 0.1131 29.3914 19.3339 29.3374 29.3384
0.0098 10.1536 80000 19.9826 0.1145 29.3674 19.285 29.3144 29.3155
0.0102 10.2805 81000 19.9820 0.1157 29.3795 19.289 29.3219 29.3225
0.0106 10.4074 82000 19.9826 0.1130 29.3967 19.3087 29.3429 29.3442
0.0109 10.5343 83000 19.9822 0.1153 29.3567 19.2608 29.302 29.2994
0.0111 10.6613 84000 19.9830 0.1102 29.4025 19.3139 29.3508 29.3489
0.0112 10.7882 85000 19.9824 0.1137 29.388 19.3308 29.3372 29.3364
0.0112 10.9151 86000 19.9839 0.1131 29.332 19.2474 29.2773 29.2767
0.0099 11.0420 87000 19.9827 0.1265 29.4045 19.3349 29.3572 29.3556
0.0068 11.1689 88000 19.9832 0.1260 29.3837 19.3078 29.3288 29.3311
0.0073 11.2958 89000 19.9831 0.1272 29.4087 19.3054 29.3577 29.3569
0.0072 11.4228 90000 19.9833 0.1247 29.4105 19.3322 29.3677 29.3661
0.0074 11.5497 91000 19.9834 0.1260 29.4055 19.3407 29.3556 29.3559
0.0074 11.6766 92000 19.9832 0.1249 29.4132 19.3448 29.3621 29.362
0.0077 11.8035 93000 19.9831 0.1270 29.4329 19.375 29.3821 29.3841
0.0075 11.9304 94000 19.9823 0.1238 29.4467 19.3887 29.3919 29.393
0.0062 12.0574 95000 19.9830 0.1377 29.4483 19.3615 29.3884 29.3887
0.0046 12.1843 96000 19.9827 0.1373 29.4335 19.3566 29.3803 29.3797
0.0047 12.3112 97000 19.9829 0.1405 29.4673 19.4217 29.4141 29.4156
0.0049 12.4381 98000 19.9830 0.1400 29.4356 19.3731 29.3825 29.3816
0.0049 12.5650 99000 19.9829 0.1398 29.4273 19.3572 29.3761 29.3732
0.0049 12.6920 100000 19.9825 0.1402 29.4287 19.3521 29.371 29.3714
0.0049 12.8189 101000 19.9832 0.1380 29.4725 19.416 29.4213 29.4179
0.0048 12.9458 102000 19.9826 0.1401 29.4868 19.4259 29.4365 29.4349
0.0038 13.0727 103000 19.9829 0.1475 29.4834 19.4238 29.4304 29.4293
0.0031 13.1996 104000 19.9830 0.1542 29.4768 19.4135 29.4248 29.426
0.0031 13.3266 105000 19.9832 0.1509 29.488 19.4293 29.4396 29.4393
0.0031 13.4535 106000 19.9828 0.1515 29.5085 19.4347 29.4534 29.4551
0.003 13.5804 107000 19.9832 0.1510 29.5041 19.4501 29.4481 29.449
0.0032 13.7073 108000 19.9832 0.1538 29.4726 19.4144 29.4241 29.4217
0.0032 13.8342 109000 19.9829 0.1546 29.4617 19.3972 29.4076 29.4102
0.003 13.9612 110000 19.9824 0.1557 29.462 19.409 29.4092 29.4137
0.0024 14.0881 111000 19.9827 0.1600 29.4882 19.4255 29.4348 29.4349
0.002 14.2150 112000 19.9828 0.1659 29.4852 19.4236 29.4328 29.4334
0.002 14.3419 113000 19.9825 0.1701 29.4727 19.411 29.4151 29.419
0.002 14.4688 114000 19.9830 0.1654 29.508 19.4256 29.4462 29.452
0.0021 14.5958 115000 19.9828 0.1656 29.4828 19.4156 29.4278 29.4286
0.0019 14.7227 116000 19.9826 0.1677 29.4924 19.4291 29.4385 29.4412
0.0019 14.8496 117000 19.9821 0.1695 29.5279 19.4686 29.4749 29.4762
0.0019 14.9765 118000 19.9821 0.1703 29.5104 19.4567 29.4541 29.457
0.0015 15.1034 119000 19.9821 0.1784 29.4925 19.4395 29.4366 29.4411
0.0015 15.2304 120000 19.9817 0.1780 29.5039 19.4525 29.4461 29.4498
0.0014 15.3573 121000 19.9822 0.1793 29.4941 19.444 29.4367 29.4406
0.0014 15.4842 122000 19.9821 0.1815 29.5074 19.4576 29.4523 29.4539
0.0012 15.6111 123000 19.9818 0.1842 29.5083 19.466 29.4552 29.4584
0.0013 15.7380 124000 19.9823 0.1833 29.5261 19.4894 29.4704 29.4739
0.0013 15.8650 125000 19.9825 0.1836 29.5206 19.4849 29.4653 29.4672
0.0012 15.9919 126000 19.9820 0.1840 29.5204 19.4851 29.4651 29.4674
0.0012 16.0 126064 0.1840 29.5204 19.4851 29.4651 29.4674 19.9820

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.4.1
  • Tokenizers 0.21.1
Downloads last month
14
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support