4613e52fc5486ec23a4cb323d2180803
This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [en-fi] dataset. It achieves the following results on the evaluation set:
- Loss: 3.0263
- Data Size: 1.0
- Epoch Runtime: 15.6265
- Bleu: 2.6435
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Bleu |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 26.8070 | 0 | 1.8649 | 0.0019 |
| No log | 1 | 91 | 26.6464 | 0.0078 | 2.9198 | 0.0021 |
| No log | 2 | 182 | 25.2832 | 0.0156 | 2.7108 | 0.0024 |
| No log | 3 | 273 | 23.9219 | 0.0312 | 3.1279 | 0.0022 |
| No log | 4 | 364 | 22.2360 | 0.0625 | 3.4430 | 0.0025 |
| No log | 5 | 455 | 18.5403 | 0.125 | 4.3440 | 0.0030 |
| No log | 6 | 546 | 14.9720 | 0.25 | 6.0982 | 0.0038 |
| 2.1573 | 7 | 637 | 10.0920 | 0.5 | 9.4463 | 0.0057 |
| 10.8751 | 8.0 | 728 | 6.6132 | 1.0 | 16.2321 | 0.0266 |
| 7.1163 | 9.0 | 819 | 4.4833 | 1.0 | 14.2330 | 0.0333 |
| 5.5747 | 10.0 | 910 | 4.0264 | 1.0 | 14.7734 | 0.4297 |
| 5.0514 | 11.0 | 1001 | 3.8025 | 1.0 | 15.0448 | 0.6325 |
| 4.8985 | 12.0 | 1092 | 3.6709 | 1.0 | 15.4207 | 0.8203 |
| 4.6751 | 13.0 | 1183 | 3.5885 | 1.0 | 15.1690 | 1.1198 |
| 4.5114 | 14.0 | 1274 | 3.5234 | 1.0 | 15.3498 | 1.3589 |
| 4.3597 | 15.0 | 1365 | 3.4709 | 1.0 | 14.0717 | 1.4570 |
| 4.2411 | 16.0 | 1456 | 3.4312 | 1.0 | 14.1590 | 1.5430 |
| 4.1973 | 17.0 | 1547 | 3.3930 | 1.0 | 15.4894 | 1.5711 |
| 4.1391 | 18.0 | 1638 | 3.3629 | 1.0 | 14.7562 | 1.6843 |
| 4.0203 | 19.0 | 1729 | 3.3375 | 1.0 | 14.6328 | 1.7110 |
| 3.9747 | 20.0 | 1820 | 3.3116 | 1.0 | 14.5719 | 1.7470 |
| 3.9412 | 21.0 | 1911 | 3.2892 | 1.0 | 15.0229 | 1.8933 |
| 3.861 | 22.0 | 2002 | 3.2745 | 1.0 | 14.2382 | 1.9115 |
| 3.861 | 23.0 | 2093 | 3.2472 | 1.0 | 14.5442 | 1.9207 |
| 3.7894 | 24.0 | 2184 | 3.2347 | 1.0 | 14.7812 | 2.0149 |
| 3.7753 | 25.0 | 2275 | 3.2174 | 1.0 | 15.1183 | 2.0566 |
| 3.7182 | 26.0 | 2366 | 3.2017 | 1.0 | 14.5415 | 2.0655 |
| 3.7048 | 27.0 | 2457 | 3.1873 | 1.0 | 14.6949 | 2.1741 |
| 3.6486 | 28.0 | 2548 | 3.1744 | 1.0 | 14.9361 | 2.1382 |
| 3.6047 | 29.0 | 2639 | 3.1601 | 1.0 | 15.7903 | 2.1818 |
| 3.5836 | 30.0 | 2730 | 3.1512 | 1.0 | 14.1550 | 2.2143 |
| 3.5298 | 31.0 | 2821 | 3.1389 | 1.0 | 14.6037 | 2.2481 |
| 3.524 | 32.0 | 2912 | 3.1312 | 1.0 | 14.4274 | 2.2530 |
| 3.4796 | 33.0 | 3003 | 3.1226 | 1.0 | 15.4424 | 2.2744 |
| 3.4087 | 34.0 | 3094 | 3.1153 | 1.0 | 15.5786 | 2.3144 |
| 3.4386 | 35.0 | 3185 | 3.1058 | 1.0 | 15.4120 | 2.2947 |
| 3.3912 | 36.0 | 3276 | 3.0996 | 1.0 | 15.5306 | 2.2894 |
| 3.3854 | 37.0 | 3367 | 3.0888 | 1.0 | 14.0790 | 2.3126 |
| 3.3341 | 38.0 | 3458 | 3.0861 | 1.0 | 14.5252 | 2.3687 |
| 3.3062 | 39.0 | 3549 | 3.0738 | 1.0 | 14.3299 | 2.3899 |
| 3.3033 | 40.0 | 3640 | 3.0772 | 1.0 | 14.8870 | 2.4320 |
| 3.2496 | 41.0 | 3731 | 3.0690 | 1.0 | 15.3802 | 2.4647 |
| 3.2566 | 42.0 | 3822 | 3.0659 | 1.0 | 15.5431 | 2.4617 |
| 3.2492 | 43.0 | 3913 | 3.0568 | 1.0 | 15.3644 | 2.5582 |
| 3.2006 | 44.0 | 4004 | 3.0567 | 1.0 | 15.2988 | 2.5684 |
| 3.189 | 45.0 | 4095 | 3.0512 | 1.0 | 14.0501 | 2.6085 |
| 3.1572 | 46.0 | 4186 | 3.0428 | 1.0 | 14.6061 | 2.5761 |
| 3.1075 | 47.0 | 4277 | 3.0340 | 1.0 | 15.0633 | 2.5948 |
| 3.1011 | 48.0 | 4368 | 3.0314 | 1.0 | 15.0853 | 2.6284 |
| 3.0978 | 49.0 | 4459 | 3.0329 | 1.0 | 15.1390 | 2.5948 |
| 3.0689 | 50.0 | 4550 | 3.0263 | 1.0 | 15.6265 | 2.6435 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for contemmcm/4613e52fc5486ec23a4cb323d2180803
Base model
google/mt5-small