3798b2451aed3cbf6dd09863bbcc2b53
This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [it-ru] dataset. It achieves the following results on the evaluation set:
- Loss: 1.3212
- Data Size: 1.0
- Epoch Runtime: 102.9030
- Bleu: 11.4523
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Bleu |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 18.7101 | 0 | 8.1375 | 0.0070 |
| No log | 1 | 447 | 15.5222 | 0.0078 | 8.8450 | 0.0112 |
| 0.2642 | 2 | 894 | 15.0198 | 0.0156 | 10.1062 | 0.0104 |
| 0.346 | 3 | 1341 | 15.2283 | 0.0312 | 12.6545 | 0.0085 |
| 0.5442 | 4 | 1788 | 8.9476 | 0.0625 | 15.6935 | 0.0130 |
| 0.8016 | 5 | 2235 | 6.7790 | 0.125 | 21.4150 | 0.0207 |
| 6.8299 | 6 | 2682 | 3.6551 | 0.25 | 32.0205 | 0.3783 |
| 3.2328 | 7 | 3129 | 2.0647 | 0.5 | 54.0593 | 4.1573 |
| 2.3429 | 8.0 | 3576 | 1.6954 | 1.0 | 97.2096 | 6.3831 |
| 2.1504 | 9.0 | 4023 | 1.5869 | 1.0 | 95.6565 | 7.4095 |
| 1.9874 | 10.0 | 4470 | 1.5282 | 1.0 | 96.8454 | 7.9612 |
| 1.88 | 11.0 | 4917 | 1.4898 | 1.0 | 96.9167 | 8.5106 |
| 1.8172 | 12.0 | 5364 | 1.4523 | 1.0 | 99.1246 | 8.8419 |
| 1.691 | 13.0 | 5811 | 1.4316 | 1.0 | 97.7661 | 9.2087 |
| 1.6377 | 14.0 | 6258 | 1.4068 | 1.0 | 96.0381 | 9.4061 |
| 1.6041 | 15.0 | 6705 | 1.3921 | 1.0 | 99.0179 | 9.9002 |
| 1.5197 | 16.0 | 7152 | 1.3805 | 1.0 | 100.8643 | 9.9666 |
| 1.4845 | 17.0 | 7599 | 1.3618 | 1.0 | 100.6985 | 10.2203 |
| 1.4617 | 18.0 | 8046 | 1.3472 | 1.0 | 102.0336 | 10.3916 |
| 1.4036 | 19.0 | 8493 | 1.3414 | 1.0 | 103.3909 | 10.5841 |
| 1.3769 | 20.0 | 8940 | 1.3384 | 1.0 | 104.4361 | 10.7687 |
| 1.3606 | 21.0 | 9387 | 1.3333 | 1.0 | 103.3733 | 10.7833 |
| 1.2803 | 22.0 | 9834 | 1.3284 | 1.0 | 103.2838 | 10.7905 |
| 1.2383 | 23.0 | 10281 | 1.3167 | 1.0 | 103.3366 | 10.9550 |
| 1.2123 | 24.0 | 10728 | 1.3226 | 1.0 | 104.1142 | 10.9717 |
| 1.1761 | 25.0 | 11175 | 1.3208 | 1.0 | 103.8856 | 11.0127 |
| 1.1347 | 26.0 | 11622 | 1.3156 | 1.0 | 103.1935 | 11.1601 |
| 1.1619 | 27.0 | 12069 | 1.3206 | 1.0 | 101.5619 | 11.2488 |
| 1.0916 | 28.0 | 12516 | 1.3234 | 1.0 | 103.9717 | 11.1902 |
| 1.0614 | 29.0 | 12963 | 1.3304 | 1.0 | 103.1075 | 11.1793 |
| 1.0344 | 30.0 | 13410 | 1.3212 | 1.0 | 102.9030 | 11.4523 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for contemmcm/3798b2451aed3cbf6dd09863bbcc2b53
Base model
google/mt5-base