4d4d7cb2f1bfc96c25e98c54d0997d8a

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [en-es] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8516
  • Data Size: 1.0
  • Epoch Runtime: 327.4974
  • Bleu: 8.2038

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 26.2854 0 27.9145 0.0017
No log 1 2336 18.9281 0.0078 32.4046 0.0018
0.3263 2 4672 11.7900 0.0156 34.2716 0.0036
0.3274 3 7008 6.1686 0.0312 39.1650 0.0218
5.361 4 9344 3.6553 0.0625 49.7337 0.0868
4.2979 5 11680 3.1433 0.125 68.3465 1.1067
3.7824 6 14016 2.9144 0.25 106.6005 1.7041
3.4797 7 16352 2.7169 0.5 183.0667 2.3146
3.1554 8.0 18688 2.5379 1.0 336.3692 3.7288
2.9481 9.0 21024 2.4301 1.0 335.9351 4.3518
2.8347 10.0 23360 2.3613 1.0 328.1369 4.9596
2.756 11.0 25696 2.3131 1.0 327.7411 5.2984
2.6544 12.0 28032 2.2728 1.0 329.1862 5.5852
2.5893 13.0 30368 2.2285 1.0 330.7622 5.7019
2.5538 14.0 32704 2.1961 1.0 330.6755 5.7741
2.4931 15.0 35040 2.1711 1.0 335.6781 6.0698
2.4965 16.0 37376 2.1462 1.0 333.3473 6.1122
2.4119 17.0 39712 2.1221 1.0 333.6272 6.2744
2.4059 18.0 42048 2.1039 1.0 332.4366 6.4836
2.3692 19.0 44384 2.0904 1.0 329.6484 6.5465
2.3331 20.0 46720 2.0651 1.0 330.2129 6.7108
2.2888 21.0 49056 2.0572 1.0 332.0176 6.7480
2.2682 22.0 51392 2.0413 1.0 332.2873 6.7908
2.2431 23.0 53728 2.0299 1.0 330.3732 6.9255
2.1991 24.0 56064 2.0156 1.0 331.4767 7.0141
2.2014 25.0 58400 2.0036 1.0 330.0735 7.1629
2.1636 26.0 60736 1.9884 1.0 329.7091 7.2376
2.1585 27.0 63072 1.9843 1.0 331.1367 7.1811
2.1246 28.0 65408 1.9690 1.0 332.0541 7.2945
2.1024 29.0 67744 1.9667 1.0 329.9478 7.4214
2.0918 30.0 70080 1.9618 1.0 332.9395 7.4096
2.0791 31.0 72416 1.9488 1.0 336.1133 7.5346
2.0916 32.0 74752 1.9397 1.0 330.8274 7.5941
2.0439 33.0 77088 1.9333 1.0 330.4461 7.5373
2.0485 34.0 79424 1.9321 1.0 329.3917 7.6020
2.0266 35.0 81760 1.9202 1.0 331.9756 7.7666
2.0335 36.0 84096 1.9151 1.0 329.3808 7.8134
1.9908 37.0 86432 1.9068 1.0 331.5035 7.7518
2.0021 38.0 88768 1.9069 1.0 334.6303 7.8101
2.0001 39.0 91104 1.8963 1.0 328.9622 7.8611
1.9735 40.0 93440 1.8955 1.0 328.4680 7.8629
1.9618 41.0 95776 1.8912 1.0 336.6663 7.9950
1.9276 42.0 98112 1.8850 1.0 328.1633 7.9522
1.9324 43.0 100448 1.8807 1.0 331.0138 7.9116
1.8694 44.0 102784 1.8756 1.0 330.6471 8.0294
1.9165 45.0 105120 1.8682 1.0 331.4836 8.1219
1.8714 46.0 107456 1.8655 1.0 329.9200 8.1032
1.8841 47.0 109792 1.8599 1.0 326.8646 8.1485
1.855 48.0 112128 1.8633 1.0 331.9095 8.1450
1.8642 49.0 114464 1.8543 1.0 331.2229 8.1218
1.8601 50.0 116800 1.8516 1.0 327.4974 8.2038

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/4d4d7cb2f1bfc96c25e98c54d0997d8a

Base model

google/mt5-small
Finetuned
(654)
this model