83ca2b931d222347c8424b603646260c
This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [de-ru] dataset. It achieves the following results on the evaluation set:
- Loss: 1.5587
- Data Size: 1.0
- Epoch Runtime: 94.9043
- Bleu: 8.9133
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: constant
- num_epochs: 50
Training results
| Training Loss | Epoch | Step | Validation Loss | Data Size | Epoch Runtime | Bleu |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 17.9214 | 0 | 8.1126 | 0.0074 |
| No log | 1 | 434 | 19.1442 | 0.0078 | 9.9087 | 0.0056 |
| No log | 2 | 868 | 15.3624 | 0.0156 | 9.7936 | 0.0099 |
| No log | 3 | 1302 | 9.9556 | 0.0312 | 11.7583 | 0.0137 |
| No log | 4 | 1736 | 7.2914 | 0.0625 | 14.6846 | 0.0149 |
| 0.467 | 5 | 2170 | 3.8665 | 0.125 | 20.3708 | 0.0821 |
| 3.9708 | 6 | 2604 | 2.4223 | 0.25 | 30.1851 | 2.6894 |
| 2.9952 | 7 | 3038 | 2.1053 | 0.5 | 51.0852 | 3.9262 |
| 2.596 | 8.0 | 3472 | 1.9132 | 1.0 | 93.1141 | 4.8514 |
| 2.3711 | 9.0 | 3906 | 1.8187 | 1.0 | 93.5745 | 5.4804 |
| 2.23 | 10.0 | 4340 | 1.7574 | 1.0 | 93.0811 | 6.0059 |
| 2.1331 | 11.0 | 4774 | 1.7146 | 1.0 | 93.0832 | 6.5665 |
| 2.0062 | 12.0 | 5208 | 1.6797 | 1.0 | 93.8144 | 6.7421 |
| 1.9482 | 13.0 | 5642 | 1.6533 | 1.0 | 93.8417 | 7.1525 |
| 1.873 | 14.0 | 6076 | 1.6290 | 1.0 | 93.1289 | 7.1184 |
| 1.8153 | 15.0 | 6510 | 1.6088 | 1.0 | 93.7660 | 7.4296 |
| 1.7606 | 16.0 | 6944 | 1.5983 | 1.0 | 93.3560 | 7.7978 |
| 1.7148 | 17.0 | 7378 | 1.5834 | 1.0 | 93.1155 | 7.9990 |
| 1.6777 | 18.0 | 7812 | 1.5761 | 1.0 | 94.9056 | 8.1409 |
| 1.5887 | 19.0 | 8246 | 1.5651 | 1.0 | 93.7158 | 8.1161 |
| 1.5872 | 20.0 | 8680 | 1.5490 | 1.0 | 94.3458 | 8.2935 |
| 1.534 | 21.0 | 9114 | 1.5488 | 1.0 | 95.2056 | 8.4408 |
| 1.4803 | 22.0 | 9548 | 1.5472 | 1.0 | 93.1390 | 8.4442 |
| 1.4652 | 23.0 | 9982 | 1.5434 | 1.0 | 93.4683 | 8.5657 |
| 1.4289 | 24.0 | 10416 | 1.5374 | 1.0 | 93.5374 | 8.6401 |
| 1.3991 | 25.0 | 10850 | 1.5361 | 1.0 | 94.5638 | 8.6876 |
| 1.3638 | 26.0 | 11284 | 1.5435 | 1.0 | 93.9030 | 8.7042 |
| 1.3452 | 27.0 | 11718 | 1.5347 | 1.0 | 92.9642 | 8.7730 |
| 1.2729 | 28.0 | 12152 | 1.5374 | 1.0 | 93.2383 | 8.7727 |
| 1.2714 | 29.0 | 12586 | 1.5336 | 1.0 | 93.6202 | 8.8632 |
| 1.2404 | 30.0 | 13020 | 1.5326 | 1.0 | 93.4600 | 8.8624 |
| 1.214 | 31.0 | 13454 | 1.5427 | 1.0 | 95.5668 | 8.8654 |
| 1.1871 | 32.0 | 13888 | 1.5489 | 1.0 | 94.9191 | 8.9337 |
| 1.1825 | 33.0 | 14322 | 1.5495 | 1.0 | 92.5991 | 8.9077 |
| 1.1405 | 34.0 | 14756 | 1.5587 | 1.0 | 94.9043 | 8.9133 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu128
- Datasets 4.2.0
- Tokenizers 0.22.1
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for contemmcm/83ca2b931d222347c8424b603646260c
Base model
google/mt5-base