13f9ad0fd0322d7cd9273f06b65b5233
This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [de-es] dataset. It achieves the following results on the evaluation set:
- Loss: 2.0161
- Data Size: 1.0
- Epoch Runtime: 144.7519
- Bleu: 5.7980
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Bleu |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 15.6258 | 0 | 12.2547 | 0.0134 |
| No log | 1 | 688 | 12.4983 | 0.0078 | 13.7138 | 0.0155 |
| No log | 2 | 1376 | 11.5197 | 0.0156 | 14.9386 | 0.0127 |
| No log | 3 | 2064 | 10.6213 | 0.0312 | 17.9074 | 0.0106 |
| 0.4954 | 4 | 2752 | 9.6613 | 0.0625 | 22.3583 | 0.0081 |
| 0.8229 | 5 | 3440 | 6.1175 | 0.125 | 30.8167 | 0.0141 |
| 3.8555 | 6 | 4128 | 2.7627 | 0.25 | 47.7922 | 2.7902 |
| 3.2049 | 7 | 4816 | 2.4681 | 0.5 | 83.9102 | 3.3027 |
| 2.8805 | 8.0 | 5504 | 2.3248 | 1.0 | 149.4133 | 2.9320 |
| 2.7253 | 9.0 | 6192 | 2.2407 | 1.0 | 145.2460 | 3.9451 |
| 2.5951 | 10.0 | 6880 | 2.1922 | 1.0 | 146.1101 | 4.2159 |
| 2.5115 | 11.0 | 7568 | 2.1599 | 1.0 | 157.3725 | 4.4794 |
| 2.4214 | 12.0 | 8256 | 2.1309 | 1.0 | 155.4184 | 4.9169 |
| 2.3681 | 13.0 | 8944 | 2.1120 | 1.0 | 148.8502 | 4.9829 |
| 2.335 | 14.0 | 9632 | 2.0946 | 1.0 | 148.5904 | 5.1887 |
| 2.2482 | 15.0 | 10320 | 2.0830 | 1.0 | 145.3452 | 5.2576 |
| 2.1906 | 16.0 | 11008 | 2.0633 | 1.0 | 145.2194 | 5.3111 |
| 2.1795 | 17.0 | 11696 | 2.0571 | 1.0 | 145.1608 | 5.5390 |
| 2.1068 | 18.0 | 12384 | 2.0420 | 1.0 | 145.6123 | 5.5147 |
| 2.1016 | 19.0 | 13072 | 2.0314 | 1.0 | 144.0503 | 5.6088 |
| 2.0614 | 20.0 | 13760 | 2.0222 | 1.0 | 146.8678 | 5.6972 |
| 2.0181 | 21.0 | 14448 | 2.0209 | 1.0 | 145.4855 | 5.7244 |
| 1.9832 | 22.0 | 15136 | 2.0253 | 1.0 | 145.5249 | 5.7423 |
| 1.9477 | 23.0 | 15824 | 2.0215 | 1.0 | 146.1704 | 5.6548 |
| 1.9223 | 24.0 | 16512 | 2.0203 | 1.0 | 145.2873 | 5.9267 |
| 1.9124 | 25.0 | 17200 | 2.0112 | 1.0 | 145.5622 | 5.7941 |
| 1.8305 | 26.0 | 17888 | 2.0095 | 1.0 | 144.1184 | 5.9749 |
| 1.8449 | 27.0 | 18576 | 2.0177 | 1.0 | 144.2892 | 5.8225 |
| 1.8013 | 28.0 | 19264 | 2.0191 | 1.0 | 145.0509 | 5.9746 |
| 1.7786 | 29.0 | 19952 | 2.0137 | 1.0 | 144.8962 | 5.9460 |
| 1.7274 | 30.0 | 20640 | 2.0161 | 1.0 | 144.7519 | 5.7980 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for contemmcm/13f9ad0fd0322d7cd9273f06b65b5233
Base model
google/mt5-base