64584b7c6826bf6aa72d546ba8abf557

This model is a fine-tuned version of google/mt5-large on the Helsinki-NLP/opus_books [es-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7005
  • Data Size: 1.0
  • Epoch Runtime: 24.1667
  • Bleu: 11.8501

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 24.2349 0 2.1315 0.0258
No log 1 33 23.5409 0.0078 2.5022 0.0259
No log 2 66 21.6421 0.0156 4.7012 0.0237
No log 3 99 22.7850 0.0312 7.1662 0.0176
1.2843 4 132 21.1930 0.0625 10.2092 0.0164
1.2843 5 165 20.8555 0.125 12.5009 0.0168
1.2843 6 198 17.9570 0.25 13.9833 0.0236
4.287 7 231 18.8896 0.5 15.6441 0.0204
14.796 8.0 264 14.5836 1.0 23.3535 0.0213
14.796 9.0 297 9.6964 1.0 22.6342 0.0273
17.6945 10.0 330 6.3427 1.0 20.0977 0.0227
9.8726 11.0 363 6.1424 1.0 20.8876 0.0207
9.8726 12.0 396 6.0423 1.0 21.3110 0.0182
8.0695 13.0 429 5.9941 1.0 21.3595 0.0174
7.5223 14.0 462 5.3786 1.0 20.5379 0.0597
7.5223 15.0 495 5.1967 1.0 21.3496 0.0828
6.7361 16.0 528 3.1778 1.0 22.7931 0.3027
4.6037 17.0 561 2.0157 1.0 20.7813 1.1779
4.6037 18.0 594 1.8168 1.0 22.1433 6.8359
2.4518 19.0 627 1.7521 1.0 23.4117 7.5031
1.9653 20.0 660 1.7062 1.0 20.7907 7.8407
1.9653 21.0 693 1.6971 1.0 20.7467 8.5446
1.743 22.0 726 1.6720 1.0 21.6457 9.1907
1.5908 23.0 759 1.6659 1.0 23.3958 9.8446
1.5908 24.0 792 1.6796 1.0 21.4205 10.2767
1.4478 25.0 825 1.6936 1.0 22.3413 10.6892
1.3382 26.0 858 1.7027 1.0 22.7737 11.3011
1.3382 27.0 891 1.7005 1.0 24.1667 11.8501

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/64584b7c6826bf6aa72d546ba8abf557

Base model

google/mt5-large
Finetuned
(98)
this model