b075eb4a9e29a1e6571d7047473604c5
This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [fr-nl] dataset. It achieves the following results on the evaluation set:
- Loss: 1.6085
- Data Size: 1.0
- Epoch Runtime: 209.5152
- Bleu: 9.7936
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Bleu |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 15.4205 | 0 | 17.2439 | 0.0154 |
| No log | 1 | 1000 | 13.9017 | 0.0078 | 19.7549 | 0.0194 |
| No log | 2 | 2000 | 11.4029 | 0.0156 | 20.5286 | 0.0212 |
| No log | 3 | 3000 | 10.3222 | 0.0312 | 24.3867 | 0.0179 |
| 0.4118 | 4 | 4000 | 5.8715 | 0.0625 | 31.3563 | 0.0230 |
| 4.4572 | 5 | 5000 | 2.9395 | 0.125 | 42.7468 | 2.0196 |
| 0.2117 | 6 | 6000 | 2.5269 | 0.25 | 66.7118 | 3.8018 |
| 0.2683 | 7 | 7000 | 2.3041 | 0.5 | 116.0491 | 4.8542 |
| 2.6311 | 8.0 | 8000 | 2.1122 | 1.0 | 210.4427 | 5.7843 |
| 2.4335 | 9.0 | 9000 | 2.0011 | 1.0 | 207.4353 | 6.4811 |
| 2.2967 | 10.0 | 10000 | 1.9233 | 1.0 | 207.8650 | 6.9229 |
| 2.2179 | 11.0 | 11000 | 1.8646 | 1.0 | 207.1371 | 7.3574 |
| 2.1122 | 12.0 | 12000 | 1.8194 | 1.0 | 206.8493 | 7.7190 |
| 2.0336 | 13.0 | 13000 | 1.7927 | 1.0 | 207.7914 | 7.9566 |
| 1.9738 | 14.0 | 14000 | 1.7589 | 1.0 | 210.8171 | 8.2123 |
| 1.8977 | 15.0 | 15000 | 1.7363 | 1.0 | 209.6291 | 8.3925 |
| 1.8554 | 16.0 | 16000 | 1.7115 | 1.0 | 210.7359 | 8.5549 |
| 1.7706 | 17.0 | 17000 | 1.6998 | 1.0 | 211.4426 | 8.7277 |
| 1.7873 | 18.0 | 18000 | 1.6766 | 1.0 | 211.3672 | 8.8573 |
| 1.7265 | 19.0 | 19000 | 1.6663 | 1.0 | 209.6792 | 8.9481 |
| 1.672 | 20.0 | 20000 | 1.6533 | 1.0 | 210.9265 | 9.1015 |
| 1.6525 | 21.0 | 21000 | 1.6385 | 1.0 | 210.8974 | 9.1918 |
| 1.6174 | 22.0 | 22000 | 1.6344 | 1.0 | 211.5922 | 9.3196 |
| 1.5776 | 23.0 | 23000 | 1.6296 | 1.0 | 212.8005 | 9.3744 |
| 1.5264 | 24.0 | 24000 | 1.6211 | 1.0 | 209.8647 | 9.4105 |
| 1.5552 | 25.0 | 25000 | 1.6154 | 1.0 | 210.6224 | 9.5693 |
| 1.4744 | 26.0 | 26000 | 1.6045 | 1.0 | 209.0745 | 9.5901 |
| 1.4567 | 27.0 | 27000 | 1.6080 | 1.0 | 209.5370 | 9.6895 |
| 1.4102 | 28.0 | 28000 | 1.6087 | 1.0 | 209.9979 | 9.6693 |
| 1.4109 | 29.0 | 29000 | 1.6057 | 1.0 | 209.9585 | 9.7511 |
| 1.3673 | 30.0 | 30000 | 1.6085 | 1.0 | 209.5152 | 9.7936 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for contemmcm/b075eb4a9e29a1e6571d7047473604c5
Base model
google/mt5-base