4d4d7cb2f1bfc96c25e98c54d0997d8a
This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [en-es] dataset. It achieves the following results on the evaluation set:
- Loss: 1.8516
- Data Size: 1.0
- Epoch Runtime: 327.4974
- Bleu: 8.2038
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Bleu |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 26.2854 | 0 | 27.9145 | 0.0017 |
| No log | 1 | 2336 | 18.9281 | 0.0078 | 32.4046 | 0.0018 |
| 0.3263 | 2 | 4672 | 11.7900 | 0.0156 | 34.2716 | 0.0036 |
| 0.3274 | 3 | 7008 | 6.1686 | 0.0312 | 39.1650 | 0.0218 |
| 5.361 | 4 | 9344 | 3.6553 | 0.0625 | 49.7337 | 0.0868 |
| 4.2979 | 5 | 11680 | 3.1433 | 0.125 | 68.3465 | 1.1067 |
| 3.7824 | 6 | 14016 | 2.9144 | 0.25 | 106.6005 | 1.7041 |
| 3.4797 | 7 | 16352 | 2.7169 | 0.5 | 183.0667 | 2.3146 |
| 3.1554 | 8.0 | 18688 | 2.5379 | 1.0 | 336.3692 | 3.7288 |
| 2.9481 | 9.0 | 21024 | 2.4301 | 1.0 | 335.9351 | 4.3518 |
| 2.8347 | 10.0 | 23360 | 2.3613 | 1.0 | 328.1369 | 4.9596 |
| 2.756 | 11.0 | 25696 | 2.3131 | 1.0 | 327.7411 | 5.2984 |
| 2.6544 | 12.0 | 28032 | 2.2728 | 1.0 | 329.1862 | 5.5852 |
| 2.5893 | 13.0 | 30368 | 2.2285 | 1.0 | 330.7622 | 5.7019 |
| 2.5538 | 14.0 | 32704 | 2.1961 | 1.0 | 330.6755 | 5.7741 |
| 2.4931 | 15.0 | 35040 | 2.1711 | 1.0 | 335.6781 | 6.0698 |
| 2.4965 | 16.0 | 37376 | 2.1462 | 1.0 | 333.3473 | 6.1122 |
| 2.4119 | 17.0 | 39712 | 2.1221 | 1.0 | 333.6272 | 6.2744 |
| 2.4059 | 18.0 | 42048 | 2.1039 | 1.0 | 332.4366 | 6.4836 |
| 2.3692 | 19.0 | 44384 | 2.0904 | 1.0 | 329.6484 | 6.5465 |
| 2.3331 | 20.0 | 46720 | 2.0651 | 1.0 | 330.2129 | 6.7108 |
| 2.2888 | 21.0 | 49056 | 2.0572 | 1.0 | 332.0176 | 6.7480 |
| 2.2682 | 22.0 | 51392 | 2.0413 | 1.0 | 332.2873 | 6.7908 |
| 2.2431 | 23.0 | 53728 | 2.0299 | 1.0 | 330.3732 | 6.9255 |
| 2.1991 | 24.0 | 56064 | 2.0156 | 1.0 | 331.4767 | 7.0141 |
| 2.2014 | 25.0 | 58400 | 2.0036 | 1.0 | 330.0735 | 7.1629 |
| 2.1636 | 26.0 | 60736 | 1.9884 | 1.0 | 329.7091 | 7.2376 |
| 2.1585 | 27.0 | 63072 | 1.9843 | 1.0 | 331.1367 | 7.1811 |
| 2.1246 | 28.0 | 65408 | 1.9690 | 1.0 | 332.0541 | 7.2945 |
| 2.1024 | 29.0 | 67744 | 1.9667 | 1.0 | 329.9478 | 7.4214 |
| 2.0918 | 30.0 | 70080 | 1.9618 | 1.0 | 332.9395 | 7.4096 |
| 2.0791 | 31.0 | 72416 | 1.9488 | 1.0 | 336.1133 | 7.5346 |
| 2.0916 | 32.0 | 74752 | 1.9397 | 1.0 | 330.8274 | 7.5941 |
| 2.0439 | 33.0 | 77088 | 1.9333 | 1.0 | 330.4461 | 7.5373 |
| 2.0485 | 34.0 | 79424 | 1.9321 | 1.0 | 329.3917 | 7.6020 |
| 2.0266 | 35.0 | 81760 | 1.9202 | 1.0 | 331.9756 | 7.7666 |
| 2.0335 | 36.0 | 84096 | 1.9151 | 1.0 | 329.3808 | 7.8134 |
| 1.9908 | 37.0 | 86432 | 1.9068 | 1.0 | 331.5035 | 7.7518 |
| 2.0021 | 38.0 | 88768 | 1.9069 | 1.0 | 334.6303 | 7.8101 |
| 2.0001 | 39.0 | 91104 | 1.8963 | 1.0 | 328.9622 | 7.8611 |
| 1.9735 | 40.0 | 93440 | 1.8955 | 1.0 | 328.4680 | 7.8629 |
| 1.9618 | 41.0 | 95776 | 1.8912 | 1.0 | 336.6663 | 7.9950 |
| 1.9276 | 42.0 | 98112 | 1.8850 | 1.0 | 328.1633 | 7.9522 |
| 1.9324 | 43.0 | 100448 | 1.8807 | 1.0 | 331.0138 | 7.9116 |
| 1.8694 | 44.0 | 102784 | 1.8756 | 1.0 | 330.6471 | 8.0294 |
| 1.9165 | 45.0 | 105120 | 1.8682 | 1.0 | 331.4836 | 8.1219 |
| 1.8714 | 46.0 | 107456 | 1.8655 | 1.0 | 329.9200 | 8.1032 |
| 1.8841 | 47.0 | 109792 | 1.8599 | 1.0 | 326.8646 | 8.1485 |
| 1.855 | 48.0 | 112128 | 1.8633 | 1.0 | 331.9095 | 8.1450 |
| 1.8642 | 49.0 | 114464 | 1.8543 | 1.0 | 331.2229 | 8.1218 |
| 1.8601 | 50.0 | 116800 | 1.8516 | 1.0 | 327.4974 | 8.2038 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for contemmcm/4d4d7cb2f1bfc96c25e98c54d0997d8a
Base model
google/mt5-small