4613e52fc5486ec23a4cb323d2180803

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [en-fi] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0263
  • Data Size: 1.0
  • Epoch Runtime: 15.6265
  • Bleu: 2.6435

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 26.8070 0 1.8649 0.0019
No log 1 91 26.6464 0.0078 2.9198 0.0021
No log 2 182 25.2832 0.0156 2.7108 0.0024
No log 3 273 23.9219 0.0312 3.1279 0.0022
No log 4 364 22.2360 0.0625 3.4430 0.0025
No log 5 455 18.5403 0.125 4.3440 0.0030
No log 6 546 14.9720 0.25 6.0982 0.0038
2.1573 7 637 10.0920 0.5 9.4463 0.0057
10.8751 8.0 728 6.6132 1.0 16.2321 0.0266
7.1163 9.0 819 4.4833 1.0 14.2330 0.0333
5.5747 10.0 910 4.0264 1.0 14.7734 0.4297
5.0514 11.0 1001 3.8025 1.0 15.0448 0.6325
4.8985 12.0 1092 3.6709 1.0 15.4207 0.8203
4.6751 13.0 1183 3.5885 1.0 15.1690 1.1198
4.5114 14.0 1274 3.5234 1.0 15.3498 1.3589
4.3597 15.0 1365 3.4709 1.0 14.0717 1.4570
4.2411 16.0 1456 3.4312 1.0 14.1590 1.5430
4.1973 17.0 1547 3.3930 1.0 15.4894 1.5711
4.1391 18.0 1638 3.3629 1.0 14.7562 1.6843
4.0203 19.0 1729 3.3375 1.0 14.6328 1.7110
3.9747 20.0 1820 3.3116 1.0 14.5719 1.7470
3.9412 21.0 1911 3.2892 1.0 15.0229 1.8933
3.861 22.0 2002 3.2745 1.0 14.2382 1.9115
3.861 23.0 2093 3.2472 1.0 14.5442 1.9207
3.7894 24.0 2184 3.2347 1.0 14.7812 2.0149
3.7753 25.0 2275 3.2174 1.0 15.1183 2.0566
3.7182 26.0 2366 3.2017 1.0 14.5415 2.0655
3.7048 27.0 2457 3.1873 1.0 14.6949 2.1741
3.6486 28.0 2548 3.1744 1.0 14.9361 2.1382
3.6047 29.0 2639 3.1601 1.0 15.7903 2.1818
3.5836 30.0 2730 3.1512 1.0 14.1550 2.2143
3.5298 31.0 2821 3.1389 1.0 14.6037 2.2481
3.524 32.0 2912 3.1312 1.0 14.4274 2.2530
3.4796 33.0 3003 3.1226 1.0 15.4424 2.2744
3.4087 34.0 3094 3.1153 1.0 15.5786 2.3144
3.4386 35.0 3185 3.1058 1.0 15.4120 2.2947
3.3912 36.0 3276 3.0996 1.0 15.5306 2.2894
3.3854 37.0 3367 3.0888 1.0 14.0790 2.3126
3.3341 38.0 3458 3.0861 1.0 14.5252 2.3687
3.3062 39.0 3549 3.0738 1.0 14.3299 2.3899
3.3033 40.0 3640 3.0772 1.0 14.8870 2.4320
3.2496 41.0 3731 3.0690 1.0 15.3802 2.4647
3.2566 42.0 3822 3.0659 1.0 15.5431 2.4617
3.2492 43.0 3913 3.0568 1.0 15.3644 2.5582
3.2006 44.0 4004 3.0567 1.0 15.2988 2.5684
3.189 45.0 4095 3.0512 1.0 14.0501 2.6085
3.1572 46.0 4186 3.0428 1.0 14.6061 2.5761
3.1075 47.0 4277 3.0340 1.0 15.0633 2.5948
3.1011 48.0 4368 3.0314 1.0 15.0853 2.6284
3.0978 49.0 4459 3.0329 1.0 15.1390 2.5948
3.0689 50.0 4550 3.0263 1.0 15.6265 2.6435

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/4613e52fc5486ec23a4cb323d2180803

Base model

google/mt5-small
Finetuned
(666)
this model