45cde632e48083150b2baa2c26179d42

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [de-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5995
  • Data Size: 1.0
  • Epoch Runtime: 6.7684
  • Bleu: 3.2270

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 31.3452 0 1.1404 0.0044
No log 1 27 30.9914 0.0078 1.5898 0.0061
No log 2 54 30.3526 0.0156 1.6757 0.0056
No log 3 81 29.6529 0.0312 2.0404 0.0071
No log 4 108 28.9018 0.0625 2.4687 0.0045
No log 5 135 25.1327 0.125 2.9368 0.0076
No log 6 162 19.7554 0.25 3.2487 0.0060
No log 7 189 15.7126 0.5 4.3466 0.0094
4.4185 8.0 216 12.0272 1.0 6.4605 0.0154
4.4185 9.0 243 9.8194 1.0 5.9703 0.0141
13.998 10.0 270 8.0210 1.0 5.9067 0.0107
13.998 11.0 297 6.6757 1.0 5.9823 0.0119
9.7665 12.0 324 5.7102 1.0 6.4256 0.0096
7.2327 13.0 351 5.1367 1.0 6.5897 0.0204
7.2327 14.0 378 4.4280 1.0 6.8430 0.0239
5.7897 15.0 405 3.7653 1.0 5.3993 0.0450
5.7897 16.0 432 3.5210 1.0 5.7351 0.4392
4.8733 17.0 459 3.3821 1.0 5.5197 0.7493
4.8733 18.0 486 3.2930 1.0 5.6848 0.9425
4.4002 19.0 513 3.2074 1.0 6.0612 1.0981
4.4002 20.0 540 3.1383 1.0 6.3865 1.1585
4.1301 21.0 567 3.0881 1.0 7.0542 1.1100
4.1301 22.0 594 3.0515 1.0 6.7974 1.2151
3.9187 23.0 621 3.0005 1.0 6.9739 1.2858
3.9187 24.0 648 2.9695 1.0 7.2571 1.5159
3.7751 25.0 675 2.9338 1.0 7.1805 1.5693
3.6794 26.0 702 2.9014 1.0 7.6884 1.6659
3.6794 27.0 729 2.8833 1.0 7.4668 1.7657
3.5659 28.0 756 2.8624 1.0 8.1261 1.8383
3.5659 29.0 783 2.8419 1.0 8.3726 1.9546
3.486 30.0 810 2.8249 1.0 8.2575 2.0172
3.486 31.0 837 2.8029 1.0 5.4925 2.0143
3.4077 32.0 864 2.7929 1.0 5.7737 2.2607
3.4077 33.0 891 2.7721 1.0 5.7768 2.3054
3.3307 34.0 918 2.7613 1.0 6.1863 2.3481
3.3307 35.0 945 2.7427 1.0 6.3883 2.3710
3.2612 36.0 972 2.7318 1.0 6.9401 2.3938
3.2612 37.0 999 2.7155 1.0 6.7833 2.4118
3.217 38.0 1026 2.7033 1.0 6.6972 2.4964
3.1534 39.0 1053 2.6954 1.0 6.9671 2.4831
3.1534 40.0 1080 2.6767 1.0 7.3655 2.6130
3.1121 41.0 1107 2.6673 1.0 7.4479 2.7751
3.1121 42.0 1134 2.6608 1.0 8.0463 2.8065
3.0758 43.0 1161 2.6494 1.0 8.4453 2.8540
3.0758 44.0 1188 2.6424 1.0 7.9917 2.8893
3.0036 45.0 1215 2.6317 1.0 8.0109 2.9841
3.0036 46.0 1242 2.6262 1.0 5.5405 3.0465
2.9671 47.0 1269 2.6213 1.0 5.4395 3.1307
2.9671 48.0 1296 2.6121 1.0 5.8908 3.1183
2.9548 49.0 1323 2.6074 1.0 6.8068 3.1747
2.897 50.0 1350 2.5995 1.0 6.7684 3.2270

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/45cde632e48083150b2baa2c26179d42

Base model

google/mt5-small
Finetuned
(667)
this model