ab93ed2e9d34eebca5e5aa081372bebc

This model is a fine-tuned version of google/mt5-large on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6164
  • Data Size: 1.0
  • Epoch Runtime: 518.7110
  • Bleu: 11.2354

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 23.6601 0 39.3538 0.0188
No log 1 1286 21.2471 0.0078 43.0395 0.0080
0.4589 2 2572 10.7776 0.0156 49.1449 0.0202
0.4297 3 3858 15.8878 0.0312 58.4469 0.0380
0.4698 4 5144 2.8676 0.0625 75.4727 0.5967
2.7418 5 6430 2.1100 0.125 105.0327 8.8864
2.3992 6 7716 1.9327 0.25 164.3769 8.1645
2.1975 7 9002 1.8312 0.5 283.6717 9.0939
2.0602 8.0 10288 1.7396 1.0 517.8994 10.0316
1.903 9.0 11574 1.6808 1.0 516.5106 10.1810
1.7955 10.0 12860 1.6475 1.0 517.5944 10.6325
1.6836 11.0 14146 1.6328 1.0 517.5800 10.8419
1.6617 12.0 15432 1.6095 1.0 518.8468 10.8037
1.5574 13.0 16718 1.6049 1.0 518.3699 10.8925
1.5154 14.0 18004 1.5989 1.0 518.8894 10.8774
1.4819 15.0 19290 1.5961 1.0 519.9667 11.0827
1.4005 16.0 20576 1.6002 1.0 522.3767 11.1317
1.3689 17.0 21862 1.6039 1.0 519.4761 11.1058
1.3182 18.0 23148 1.6082 1.0 519.4104 11.0625
1.2775 19.0 24434 1.6164 1.0 518.7110 11.2354

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ab93ed2e9d34eebca5e5aa081372bebc

Base model

google/mt5-large
Finetuned
(95)
this model