c5a6aa1cac98056a363e6433add3f0d1

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [en-fr] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2190
  • Data Size: 1.0
  • Epoch Runtime: 670.7207
  • Bleu: 12.7918

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.0589 0 50.9773 0.0100
No log 1 3177 11.1813 0.0078 56.8912 0.0142
0.2118 2 6354 6.6049 0.0156 63.2922 0.0194
4.6504 3 9531 2.6897 0.0312 73.9785 0.9699
3.2417 4 12708 2.1665 0.0625 93.2277 4.1242
2.65 5 15885 1.9351 0.125 130.2395 5.0878
2.3925 6 19062 1.8160 0.25 205.5718 6.3567
2.1286 7 22239 1.6737 0.5 357.6948 7.8982
1.9196 8.0 25416 1.5453 1.0 661.9557 9.3487
1.7459 9.0 28593 1.4706 1.0 659.2342 9.9047
1.685 10.0 31770 1.4182 1.0 664.3399 10.4173
1.5963 11.0 34947 1.3772 1.0 665.4371 10.7262
1.5236 12.0 38124 1.3546 1.0 670.7171 10.9350
1.4636 13.0 41301 1.3315 1.0 669.2230 11.0899
1.4238 14.0 44478 1.3111 1.0 670.5610 11.2603
1.388 15.0 47655 1.2931 1.0 670.4118 11.5175
1.3304 16.0 50832 1.2822 1.0 671.8367 11.5027
1.3156 17.0 54009 1.2675 1.0 674.5763 11.6799
1.2983 18.0 57186 1.2560 1.0 673.3043 11.8304
1.2557 19.0 60363 1.2506 1.0 677.0719 11.9009
1.2131 20.0 63540 1.2390 1.0 677.3352 11.9842
1.1947 21.0 66717 1.2344 1.0 668.4357 12.1337
1.1726 22.0 69894 1.2325 1.0 673.0276 12.1706
1.1496 23.0 73071 1.2251 1.0 669.9871 12.3075
1.1575 24.0 76248 1.2194 1.0 665.4835 12.3460
1.128 25.0 79425 1.2180 1.0 669.1375 12.4199
1.1023 26.0 82602 1.2168 1.0 667.3919 12.4697
1.0738 27.0 85779 1.2137 1.0 663.2292 12.5530
1.0703 28.0 88956 1.2098 1.0 668.9212 12.5429
1.0455 29.0 92133 1.2121 1.0 664.4037 12.6790
1.0289 30.0 95310 1.2152 1.0 674.1455 12.6942
1.0149 31.0 98487 1.2121 1.0 665.4537 12.7257
1.0017 32.0 101664 1.2087 1.0 667.0060 12.7818
0.9835 33.0 104841 1.2099 1.0 666.1137 12.8043
0.9792 34.0 108018 1.2111 1.0 669.2019 12.8400
0.9377 35.0 111195 1.2129 1.0 663.0137 12.7757
0.9088 36.0 114372 1.2190 1.0 670.7207 12.7918

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
1.0B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/c5a6aa1cac98056a363e6433add3f0d1

Base model

google/mt5-base
Finetuned
(301)
this model