ord-forward-t5
This model is a fine-tuned version of t5-small on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0083
- Bleu: 50.6719
- Exact Match: 0.1988
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 5
Training results
| Training Loss | Epoch | Step | Validation Loss | Bleu | Exact Match |
|---|---|---|---|---|---|
| 0.1058 | 0.0226 | 2000 | 0.0872 | 34.8197 | 0.0118 |
| 0.0827 | 0.0453 | 4000 | 0.0695 | 35.6714 | 0.0262 |
| 0.0723 | 0.0679 | 6000 | 0.0600 | 36.9871 | 0.0322 |
| 0.0642 | 0.0906 | 8000 | 0.0529 | 37.5646 | 0.0418 |
| 0.058 | 0.1132 | 10000 | 0.0481 | 38.7551 | 0.0489 |
| 0.054 | 0.1358 | 12000 | 0.0440 | 39.5357 | 0.0566 |
| 0.0497 | 0.1585 | 14000 | 0.0404 | 39.6871 | 0.0619 |
| 0.0469 | 0.1811 | 16000 | 0.0387 | 40.4037 | 0.0673 |
| 0.0446 | 0.2037 | 18000 | 0.0359 | 40.8274 | 0.0720 |
| 0.0422 | 0.2264 | 20000 | 0.0339 | 41.5284 | 0.0766 |
| 0.0415 | 0.2490 | 22000 | 0.0321 | 41.7664 | 0.0803 |
| 0.0394 | 0.2717 | 24000 | 0.0311 | 42.3620 | 0.0854 |
| 0.0376 | 0.2943 | 26000 | 0.0295 | 42.7712 | 0.0915 |
| 0.0363 | 0.3169 | 28000 | 0.0284 | 42.8869 | 0.0953 |
| 0.0348 | 0.3396 | 30000 | 0.0279 | 43.2776 | 0.0971 |
| 0.0335 | 0.3622 | 32000 | 0.0262 | 43.4161 | 0.1005 |
| 0.0338 | 0.3848 | 34000 | 0.0261 | 43.6088 | 0.1032 |
| 0.0318 | 0.4075 | 36000 | 0.0249 | 43.8666 | 0.1074 |
| 0.0309 | 0.4301 | 38000 | 0.0247 | 44.0975 | 0.1073 |
| 0.03 | 0.4528 | 40000 | 0.0235 | 44.1094 | 0.1086 |
| 0.0301 | 0.4754 | 42000 | 0.0232 | 44.5585 | 0.1114 |
| 0.0286 | 0.4980 | 44000 | 0.0228 | 44.6026 | 0.1137 |
| 0.0286 | 0.5207 | 46000 | 0.0222 | 44.7055 | 0.1160 |
| 0.0283 | 0.5433 | 48000 | 0.0218 | 44.7929 | 0.1191 |
| 0.0277 | 0.5660 | 50000 | 0.0217 | 45.0728 | 0.1198 |
| 0.0269 | 0.5886 | 52000 | 0.0211 | 45.0158 | 0.1214 |
| 0.0275 | 0.6112 | 54000 | 0.0207 | 45.3122 | 0.1222 |
| 0.0267 | 0.6339 | 56000 | 0.0201 | 45.3873 | 0.1255 |
| 0.0252 | 0.6565 | 58000 | 0.0200 | 45.4518 | 0.1263 |
| 0.0255 | 0.6791 | 60000 | 0.0193 | 45.6086 | 0.1288 |
| 0.0251 | 0.7018 | 62000 | 0.0195 | 45.6695 | 0.1286 |
| 0.0244 | 0.7244 | 64000 | 0.0188 | 45.7086 | 0.1315 |
| 0.0242 | 0.7471 | 66000 | 0.0187 | 45.7538 | 0.1328 |
| 0.0236 | 0.7697 | 68000 | 0.0187 | 46.0652 | 0.1325 |
| 0.0235 | 0.7923 | 70000 | 0.0181 | 45.9055 | 0.1360 |
| 0.0235 | 0.8150 | 72000 | 0.0180 | 46.0485 | 0.1372 |
| 0.0232 | 0.8376 | 74000 | 0.0177 | 46.1891 | 0.1381 |
| 0.0225 | 0.8602 | 76000 | 0.0175 | 46.1597 | 0.1393 |
| 0.0223 | 0.8829 | 78000 | 0.0172 | 46.3654 | 0.1406 |
| 0.0221 | 0.9055 | 80000 | 0.0172 | 46.3009 | 0.1419 |
| 0.0223 | 0.9282 | 82000 | 0.0170 | 46.5075 | 0.1396 |
| 0.0219 | 0.9508 | 84000 | 0.0166 | 46.5423 | 0.1439 |
| 0.0211 | 0.9734 | 86000 | 0.0165 | 46.6099 | 0.1437 |
| 0.0215 | 0.9961 | 88000 | 0.0163 | 46.5368 | 0.1477 |
| 0.0203 | 1.0187 | 90000 | 0.0163 | 46.7192 | 0.1481 |
| 0.0209 | 1.0413 | 92000 | 0.0160 | 46.7035 | 0.1493 |
| 0.0205 | 1.0640 | 94000 | 0.0159 | 46.8002 | 0.1486 |
| 0.0206 | 1.0866 | 96000 | 0.0157 | 46.8736 | 0.1497 |
| 0.0204 | 1.1093 | 98000 | 0.0158 | 47.0318 | 0.1497 |
| 0.0197 | 1.1319 | 100000 | 0.0156 | 46.9102 | 0.1501 |
| 0.0198 | 1.1545 | 102000 | 0.0154 | 46.9915 | 0.1511 |
| 0.0197 | 1.1772 | 104000 | 0.0154 | 47.0428 | 0.1512 |
| 0.0189 | 1.1998 | 106000 | 0.0153 | 47.1425 | 0.1526 |
| 0.0195 | 1.2225 | 108000 | 0.0150 | 47.0256 | 0.1530 |
| 0.0188 | 1.2451 | 110000 | 0.0149 | 47.1682 | 0.1551 |
| 0.0186 | 1.2677 | 112000 | 0.0149 | 47.1778 | 0.1555 |
| 0.0187 | 1.2904 | 114000 | 0.0148 | 47.2621 | 0.1574 |
| 0.0186 | 1.3130 | 116000 | 0.0145 | 47.3019 | 0.1565 |
| 0.0184 | 1.3356 | 118000 | 0.0145 | 47.4187 | 0.1573 |
| 0.0183 | 1.3583 | 120000 | 0.0143 | 47.1818 | 0.1581 |
| 0.0188 | 1.3809 | 122000 | 0.0143 | 47.5076 | 0.1584 |
| 0.0179 | 1.4036 | 124000 | 0.0142 | 47.5324 | 0.1589 |
| 0.0179 | 1.4262 | 126000 | 0.0141 | 47.5996 | 0.1591 |
| 0.0168 | 1.4488 | 128000 | 0.0139 | 47.4796 | 0.1610 |
| 0.0178 | 1.4715 | 130000 | 0.0139 | 47.4263 | 0.1609 |
| 0.0178 | 1.4941 | 132000 | 0.0138 | 47.5261 | 0.1612 |
| 0.0177 | 1.5167 | 134000 | 0.0137 | 47.6366 | 0.1610 |
| 0.0176 | 1.5394 | 136000 | 0.0135 | 47.7131 | 0.1635 |
| 0.0178 | 1.5620 | 138000 | 0.0135 | 47.7976 | 0.1641 |
| 0.0177 | 1.5847 | 140000 | 0.0134 | 47.7739 | 0.1630 |
| 0.0171 | 1.6073 | 142000 | 0.0133 | 47.8164 | 0.1643 |
| 0.0172 | 1.6299 | 144000 | 0.0132 | 47.6727 | 0.1652 |
| 0.0171 | 1.6526 | 146000 | 0.0131 | 47.8773 | 0.1658 |
| 0.0172 | 1.6752 | 148000 | 0.0130 | 48.0028 | 0.1659 |
| 0.0169 | 1.6979 | 150000 | 0.0131 | 47.9244 | 0.1669 |
| 0.0164 | 1.7205 | 152000 | 0.0129 | 47.9443 | 0.1659 |
| 0.0167 | 1.7431 | 154000 | 0.0128 | 48.0010 | 0.1674 |
| 0.017 | 1.7658 | 156000 | 0.0127 | 48.0952 | 0.1683 |
| 0.0167 | 1.7884 | 158000 | 0.0127 | 47.9715 | 0.1678 |
| 0.0164 | 1.8110 | 160000 | 0.0125 | 48.0665 | 0.1679 |
| 0.0164 | 1.8337 | 162000 | 0.0125 | 48.0916 | 0.1685 |
| 0.0162 | 1.8563 | 164000 | 0.0126 | 48.0780 | 0.1686 |
| 0.0159 | 1.8790 | 166000 | 0.0124 | 48.1669 | 0.1696 |
| 0.0159 | 1.9016 | 168000 | 0.0123 | 48.2502 | 0.1713 |
| 0.0155 | 1.9242 | 170000 | 0.0124 | 48.1843 | 0.1718 |
| 0.0166 | 1.9469 | 172000 | 0.0122 | 48.2317 | 0.1702 |
| 0.0158 | 1.9695 | 174000 | 0.0123 | 48.2473 | 0.1706 |
| 0.0154 | 1.9921 | 176000 | 0.0121 | 48.2233 | 0.1707 |
| 0.0146 | 2.0148 | 178000 | 0.0119 | 48.3750 | 0.1715 |
| 0.0152 | 2.0374 | 180000 | 0.0120 | 48.3732 | 0.1726 |
| 0.0144 | 2.0601 | 182000 | 0.0119 | 48.3003 | 0.1732 |
| 0.0147 | 2.0827 | 184000 | 0.0121 | 48.3438 | 0.1721 |
| 0.0158 | 2.1053 | 186000 | 0.0117 | 48.4250 | 0.1735 |
| 0.0148 | 2.1280 | 188000 | 0.0117 | 48.4373 | 0.1740 |
| 0.0146 | 2.1506 | 190000 | 0.0117 | 48.4079 | 0.1746 |
| 0.015 | 2.1732 | 192000 | 0.0118 | 48.3787 | 0.1724 |
| 0.0146 | 2.1959 | 194000 | 0.0116 | 48.3315 | 0.1757 |
| 0.0148 | 2.2185 | 196000 | 0.0117 | 48.5133 | 0.1734 |
| 0.0149 | 2.2412 | 198000 | 0.0115 | 48.5503 | 0.1755 |
| 0.014 | 2.2638 | 200000 | 0.0114 | 48.6440 | 0.1752 |
| 0.0144 | 2.2864 | 202000 | 0.0114 | 48.4494 | 0.1752 |
| 0.0143 | 2.3091 | 204000 | 0.0113 | 48.5171 | 0.1761 |
| 0.0147 | 2.3317 | 206000 | 0.0114 | 48.5049 | 0.1756 |
| 0.0144 | 2.3544 | 208000 | 0.0114 | 48.6505 | 0.1769 |
| 0.0143 | 2.3770 | 210000 | 0.0113 | 48.5626 | 0.1769 |
| 0.0143 | 2.3996 | 212000 | 0.0114 | 48.7282 | 0.1768 |
| 0.0143 | 2.4223 | 214000 | 0.0112 | 48.6750 | 0.1763 |
| 0.0139 | 2.4449 | 216000 | 0.0111 | 48.7042 | 0.1779 |
| 0.0145 | 2.4675 | 218000 | 0.0110 | 48.6840 | 0.1780 |
| 0.0138 | 2.4902 | 220000 | 0.0109 | 48.7209 | 0.1788 |
| 0.0144 | 2.5128 | 222000 | 0.0111 | 48.7628 | 0.1809 |
| 0.0144 | 2.5355 | 224000 | 0.0108 | 48.7092 | 0.1787 |
| 0.0138 | 2.5581 | 226000 | 0.0108 | 48.7748 | 0.1795 |
| 0.014 | 2.5807 | 228000 | 0.0108 | 48.7813 | 0.1795 |
| 0.014 | 2.6034 | 230000 | 0.0108 | 48.8293 | 0.1792 |
| 0.0142 | 2.6260 | 232000 | 0.0108 | 48.8267 | 0.1803 |
| 0.0135 | 2.6486 | 234000 | 0.0107 | 48.8707 | 0.1810 |
| 0.0136 | 2.6713 | 236000 | 0.0107 | 48.8956 | 0.1806 |
| 0.0141 | 2.6939 | 238000 | 0.0106 | 48.9467 | 0.1813 |
| 0.0138 | 2.7166 | 240000 | 0.0106 | 48.8912 | 0.1795 |
| 0.0135 | 2.7392 | 242000 | 0.0106 | 48.8954 | 0.1814 |
| 0.0138 | 2.7618 | 244000 | 0.0105 | 49.0803 | 0.1818 |
| 0.0135 | 2.7845 | 246000 | 0.0104 | 48.9452 | 0.1821 |
| 0.013 | 2.8071 | 248000 | 0.0105 | 49.0192 | 0.1838 |
| 0.0134 | 2.8298 | 250000 | 0.0104 | 48.9696 | 0.1822 |
| 0.013 | 2.8524 | 252000 | 0.0103 | 48.9338 | 0.1817 |
| 0.0137 | 2.8750 | 254000 | 0.0103 | 49.0249 | 0.1827 |
| 0.0131 | 2.8977 | 256000 | 0.0103 | 49.0570 | 0.1827 |
| 0.0136 | 2.9203 | 258000 | 0.0102 | 49.1415 | 0.1844 |
| 0.0136 | 2.9429 | 260000 | 0.0102 | 49.1007 | 0.1836 |
| 0.0132 | 2.9656 | 262000 | 0.0102 | 49.0411 | 0.1843 |
| 0.0128 | 2.9882 | 264000 | 0.0102 | 49.1262 | 0.1844 |
| 0.0123 | 3.0109 | 266000 | 0.0101 | 49.1346 | 0.1841 |
| 0.0123 | 3.0335 | 268000 | 0.0101 | 49.1488 | 0.1838 |
| 0.0121 | 3.0561 | 270000 | 0.0100 | 49.1694 | 0.1852 |
| 0.0125 | 3.0788 | 272000 | 0.0100 | 49.1937 | 0.1858 |
| 0.0122 | 3.1014 | 274000 | 0.0100 | 49.1364 | 0.1856 |
| 0.0126 | 3.1240 | 276000 | 0.0100 | 49.1915 | 0.1844 |
| 0.0126 | 3.1467 | 278000 | 0.0100 | 49.1607 | 0.1850 |
| 0.0126 | 3.1693 | 280000 | 0.0099 | 49.1567 | 0.1842 |
| 0.0126 | 3.1920 | 282000 | 0.0099 | 49.2994 | 0.1860 |
| 0.0127 | 3.2146 | 284000 | 0.0098 | 49.2967 | 0.1856 |
| 0.0123 | 3.2372 | 286000 | 0.0099 | 49.2657 | 0.1869 |
| 0.0122 | 3.2599 | 288000 | 0.0098 | 49.3254 | 0.1873 |
| 0.0124 | 3.2825 | 290000 | 0.0098 | 49.3960 | 0.1869 |
| 0.0119 | 3.3051 | 292000 | 0.0097 | 49.3278 | 0.1871 |
| 0.0123 | 3.3278 | 294000 | 0.0097 | 49.3128 | 0.1861 |
| 0.0124 | 3.3504 | 296000 | 0.0096 | 49.3106 | 0.1879 |
| 0.0122 | 3.3731 | 298000 | 0.0097 | 49.3737 | 0.1891 |
| 0.0118 | 3.3957 | 300000 | 0.0096 | 49.3818 | 0.1884 |
| 0.0121 | 3.4183 | 302000 | 0.0096 | 49.4057 | 0.1893 |
| 0.0123 | 3.4410 | 304000 | 0.0096 | 49.4641 | 0.1882 |
| 0.0124 | 3.4636 | 306000 | 0.0095 | 49.3291 | 0.1886 |
| 0.012 | 3.4863 | 308000 | 0.0095 | 49.4946 | 0.1890 |
| 0.0121 | 3.5089 | 310000 | 0.0095 | 49.3872 | 0.1892 |
| 0.0121 | 3.5315 | 312000 | 0.0094 | 49.4517 | 0.1904 |
| 0.0121 | 3.5542 | 314000 | 0.0094 | 49.4236 | 0.1904 |
| 0.0122 | 3.5768 | 316000 | 0.0094 | 49.4295 | 0.1890 |
| 0.0115 | 3.5994 | 318000 | 0.0094 | 49.5112 | 0.1899 |
| 0.0113 | 3.6221 | 320000 | 0.0093 | 49.4791 | 0.1902 |
| 0.0117 | 3.6447 | 322000 | 0.0093 | 49.5464 | 0.1907 |
| 0.012 | 3.6674 | 324000 | 0.0093 | 49.5608 | 0.1908 |
| 0.0122 | 3.6900 | 326000 | 0.0093 | 49.5088 | 0.1901 |
| 0.0121 | 3.7126 | 328000 | 0.0092 | 49.6321 | 0.1912 |
| 0.0119 | 3.7353 | 330000 | 0.0092 | 49.5775 | 0.1915 |
| 0.0123 | 3.7579 | 332000 | 0.0092 | 49.5409 | 0.1910 |
| 0.0117 | 3.7805 | 334000 | 0.0091 | 49.6303 | 0.1919 |
| 0.0117 | 3.8032 | 336000 | 0.0091 | 49.6150 | 0.1912 |
| 0.0112 | 3.8258 | 338000 | 0.0091 | 49.6075 | 0.1913 |
| 0.0116 | 3.8485 | 340000 | 0.0091 | 49.5985 | 0.1914 |
| 0.0114 | 3.8711 | 342000 | 0.0091 | 49.6093 | 0.1920 |
| 0.0114 | 3.8937 | 344000 | 0.0090 | 49.6152 | 0.1921 |
| 0.0119 | 3.9164 | 346000 | 0.0090 | 49.6228 | 0.1926 |
| 0.0113 | 3.9390 | 348000 | 0.0090 | 49.6626 | 0.1925 |
| 0.0113 | 3.9617 | 350000 | 0.0089 | 49.6894 | 0.1925 |
| 0.0113 | 3.9843 | 352000 | 0.0090 | 49.7588 | 0.1919 |
| 0.0108 | 4.0069 | 354000 | 0.0090 | 49.7142 | 0.1942 |
| 0.0113 | 4.0296 | 356000 | 0.0089 | 49.7560 | 0.1934 |
| 0.0111 | 4.0522 | 358000 | 0.0089 | 49.7952 | 0.1952 |
| 0.011 | 4.0748 | 360000 | 0.0089 | 49.7782 | 0.1944 |
| 0.0108 | 4.0975 | 362000 | 0.0089 | 49.7355 | 0.1944 |
| 0.0109 | 4.1201 | 364000 | 0.0089 | 49.7382 | 0.1947 |
| 0.0109 | 4.1428 | 366000 | 0.0088 | 49.7860 | 0.1942 |
| 0.011 | 4.1654 | 368000 | 0.0087 | 49.7896 | 0.1945 |
| 0.011 | 4.1880 | 370000 | 0.0088 | 49.7495 | 0.1947 |
| 0.0112 | 4.2107 | 372000 | 0.0088 | 49.7334 | 0.1944 |
| 0.0107 | 4.2333 | 374000 | 0.0088 | 49.7848 | 0.1943 |
| 0.0104 | 4.2559 | 376000 | 0.0088 | 49.8388 | 0.1945 |
| 0.0109 | 4.2786 | 378000 | 0.0087 | 49.7591 | 0.1936 |
| 0.0106 | 4.3012 | 380000 | 0.0087 | 49.8372 | 0.1950 |
| 0.0108 | 4.3239 | 382000 | 0.0087 | 49.8242 | 0.1955 |
| 0.0104 | 4.3465 | 384000 | 0.0087 | 49.8844 | 0.1962 |
| 0.0107 | 4.3691 | 386000 | 0.0087 | 49.8759 | 0.1957 |
| 0.0107 | 4.3918 | 388000 | 0.0086 | 49.8460 | 0.1958 |
| 0.0109 | 4.4144 | 390000 | 0.0086 | 49.8999 | 0.1958 |
| 0.0106 | 4.4370 | 392000 | 0.0086 | 49.9260 | 0.1956 |
| 0.0105 | 4.4597 | 394000 | 0.0086 | 49.9298 | 0.1964 |
| 0.0108 | 4.4823 | 396000 | 0.0086 | 49.9201 | 0.1964 |
| 0.0105 | 4.5050 | 398000 | 0.0086 | 49.8979 | 0.1958 |
| 0.0107 | 4.5276 | 400000 | 0.0085 | 49.9009 | 0.1958 |
| 0.0107 | 4.5502 | 402000 | 0.0085 | 49.8843 | 0.1968 |
| 0.0105 | 4.5729 | 404000 | 0.0085 | 49.9003 | 0.1970 |
| 0.0104 | 4.5955 | 406000 | 0.0085 | 49.8893 | 0.1965 |
| 0.0108 | 4.6182 | 408000 | 0.0085 | 49.9278 | 0.1969 |
| 0.0106 | 4.6408 | 410000 | 0.0085 | 49.9527 | 0.1968 |
| 0.0105 | 4.6634 | 412000 | 0.0084 | 49.9209 | 0.1970 |
| 0.0099 | 4.6861 | 414000 | 0.0085 | 49.9390 | 0.1965 |
| 0.0105 | 4.7087 | 416000 | 0.0084 | 49.9598 | 0.1967 |
| 0.0107 | 4.7313 | 418000 | 0.0084 | 49.9966 | 0.1981 |
| 0.0108 | 4.7540 | 420000 | 0.0084 | 49.9902 | 0.1978 |
| 0.0107 | 4.7766 | 422000 | 0.0084 | 49.9369 | 0.1975 |
| 0.0112 | 4.7993 | 424000 | 0.0084 | 49.9650 | 0.1976 |
| 0.0102 | 4.8219 | 426000 | 0.0084 | 49.9707 | 0.1972 |
| 0.0105 | 4.8445 | 428000 | 0.0084 | 49.9702 | 0.1975 |
| 0.01 | 4.8672 | 430000 | 0.0084 | 49.9767 | 0.1976 |
| 0.0105 | 4.8898 | 432000 | 0.0083 | 49.9639 | 0.1976 |
| 0.0102 | 4.9124 | 434000 | 0.0084 | 49.9858 | 0.1978 |
| 0.0101 | 4.9351 | 436000 | 0.0083 | 49.9975 | 0.1978 |
| 0.0103 | 4.9577 | 438000 | 0.0083 | 49.9776 | 0.1977 |
| 0.01 | 4.9804 | 440000 | 0.0083 | 49.9956 | 0.1975 |
Framework versions
- Transformers 4.57.1
- Pytorch 2.3.1+cu121
- Datasets 4.3.0
- Tokenizers 0.22.1
- Downloads last month
- 13
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for smitathkr1/ord-forward-t5
Base model
google-t5/t5-small