64584b7c6826bf6aa72d546ba8abf557
This model is a fine-tuned version of google/mt5-large on the Helsinki-NLP/opus_books [es-pt] dataset. It achieves the following results on the evaluation set:
- Loss: 1.7005
- Data Size: 1.0
- Epoch Runtime: 24.1667
- Bleu: 11.8501
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Bleu |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 24.2349 | 0 | 2.1315 | 0.0258 |
| No log | 1 | 33 | 23.5409 | 0.0078 | 2.5022 | 0.0259 |
| No log | 2 | 66 | 21.6421 | 0.0156 | 4.7012 | 0.0237 |
| No log | 3 | 99 | 22.7850 | 0.0312 | 7.1662 | 0.0176 |
| 1.2843 | 4 | 132 | 21.1930 | 0.0625 | 10.2092 | 0.0164 |
| 1.2843 | 5 | 165 | 20.8555 | 0.125 | 12.5009 | 0.0168 |
| 1.2843 | 6 | 198 | 17.9570 | 0.25 | 13.9833 | 0.0236 |
| 4.287 | 7 | 231 | 18.8896 | 0.5 | 15.6441 | 0.0204 |
| 14.796 | 8.0 | 264 | 14.5836 | 1.0 | 23.3535 | 0.0213 |
| 14.796 | 9.0 | 297 | 9.6964 | 1.0 | 22.6342 | 0.0273 |
| 17.6945 | 10.0 | 330 | 6.3427 | 1.0 | 20.0977 | 0.0227 |
| 9.8726 | 11.0 | 363 | 6.1424 | 1.0 | 20.8876 | 0.0207 |
| 9.8726 | 12.0 | 396 | 6.0423 | 1.0 | 21.3110 | 0.0182 |
| 8.0695 | 13.0 | 429 | 5.9941 | 1.0 | 21.3595 | 0.0174 |
| 7.5223 | 14.0 | 462 | 5.3786 | 1.0 | 20.5379 | 0.0597 |
| 7.5223 | 15.0 | 495 | 5.1967 | 1.0 | 21.3496 | 0.0828 |
| 6.7361 | 16.0 | 528 | 3.1778 | 1.0 | 22.7931 | 0.3027 |
| 4.6037 | 17.0 | 561 | 2.0157 | 1.0 | 20.7813 | 1.1779 |
| 4.6037 | 18.0 | 594 | 1.8168 | 1.0 | 22.1433 | 6.8359 |
| 2.4518 | 19.0 | 627 | 1.7521 | 1.0 | 23.4117 | 7.5031 |
| 1.9653 | 20.0 | 660 | 1.7062 | 1.0 | 20.7907 | 7.8407 |
| 1.9653 | 21.0 | 693 | 1.6971 | 1.0 | 20.7467 | 8.5446 |
| 1.743 | 22.0 | 726 | 1.6720 | 1.0 | 21.6457 | 9.1907 |
| 1.5908 | 23.0 | 759 | 1.6659 | 1.0 | 23.3958 | 9.8446 |
| 1.5908 | 24.0 | 792 | 1.6796 | 1.0 | 21.4205 | 10.2767 |
| 1.4478 | 25.0 | 825 | 1.6936 | 1.0 | 22.3413 | 10.6892 |
| 1.3382 | 26.0 | 858 | 1.7027 | 1.0 | 22.7737 | 11.3011 |
| 1.3382 | 27.0 | 891 | 1.7005 | 1.0 | 24.1667 | 11.8501 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for contemmcm/64584b7c6826bf6aa72d546ba8abf557
Base model
google/mt5-large