29004e70c0b068445756e946fd41c0e8

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fr-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3858
  • Data Size: 1.0
  • Epoch Runtime: 6.0970
  • Bleu: 4.0242

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 27.5438 0 1.1526 0.0058
No log 1 31 28.4233 0.0078 1.2516 0.0063
No log 2 62 28.0428 0.0156 1.7271 0.0045
No log 3 93 26.5597 0.0312 2.1117 0.0059
No log 4 124 24.4093 0.0625 2.3954 0.0041
No log 5 155 22.2079 0.125 2.7253 0.0082
No log 6 186 17.6759 0.25 3.1398 0.0094
3.5054 7 217 13.5341 0.5 4.4290 0.0117
3.5054 8.0 248 9.8188 1.0 6.9570 0.0154
10.928 9.0 279 7.8998 1.0 6.9664 0.0167
10.4524 10.0 310 6.6437 1.0 6.9542 0.0186
10.4524 11.0 341 5.4714 1.0 7.4791 0.0186
7.6418 12.0 372 4.6749 1.0 7.6182 0.0317
6.0055 13.0 403 3.7281 1.0 7.4343 0.0887
6.0055 14.0 434 3.3707 1.0 8.3537 0.3530
4.9541 15.0 465 3.1714 1.0 5.9927 1.1423
4.9541 16.0 496 3.0677 1.0 5.9569 1.2514
4.4308 17.0 527 2.9753 1.0 5.8289 1.5034
4.1102 18.0 558 2.9014 1.0 6.2157 1.7000
4.1102 19.0 589 2.8467 1.0 6.3563 1.8398
3.9069 20.0 620 2.7960 1.0 6.8002 2.2300
3.7412 21.0 651 2.7565 1.0 7.2603 2.3792
3.7412 22.0 682 2.7301 1.0 7.5230 2.4908
3.6364 23.0 713 2.6889 1.0 7.9714 2.7077
3.6364 24.0 744 2.6600 1.0 9.1440 2.8903
3.498 25.0 775 2.6445 1.0 8.9545 2.9441
3.4366 26.0 806 2.6197 1.0 8.8483 2.9626
3.4366 27.0 837 2.5967 1.0 8.5830 3.0582
3.3506 28.0 868 2.5836 1.0 6.0944 3.1140
3.3506 29.0 899 2.5682 1.0 6.0129 3.1509
3.2866 30.0 930 2.5502 1.0 5.9571 3.1536
3.2124 31.0 961 2.5412 1.0 6.0268 3.2192
3.2124 32.0 992 2.5242 1.0 6.0170 3.2674
3.1606 33.0 1023 2.5124 1.0 6.0481 3.2754
3.1155 34.0 1054 2.4978 1.0 5.8964 3.2844
3.1155 35.0 1085 2.4921 1.0 5.9531 3.4115
3.0601 36.0 1116 2.4796 1.0 6.3192 3.5079
3.0601 37.0 1147 2.4673 1.0 6.8573 3.5407
2.998 38.0 1178 2.4623 1.0 7.3572 3.6119
2.9555 39.0 1209 2.4581 1.0 8.2948 3.6769
2.9555 40.0 1240 2.4465 1.0 8.2879 3.6685
2.9102 41.0 1271 2.4395 1.0 8.4370 3.7489
2.8699 42.0 1302 2.4366 1.0 8.7327 3.7991
2.8699 43.0 1333 2.4264 1.0 5.8849 3.7835
2.8366 44.0 1364 2.4188 1.0 5.8377 3.8390
2.8366 45.0 1395 2.4157 1.0 5.9253 3.9188
2.7911 46.0 1426 2.4078 1.0 6.1507 3.9007
2.764 47.0 1457 2.4002 1.0 5.9784 3.9577
2.764 48.0 1488 2.3913 1.0 5.9478 3.9526
2.7201 49.0 1519 2.3885 1.0 5.9027 3.9700
2.6912 50.0 1550 2.3858 1.0 6.0970 4.0242

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/29004e70c0b068445756e946fd41c0e8

Base model

google/mt5-small
Finetuned
(670)
this model