8d80c4225137775af0858becea32124c

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fr-pl] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1074
  • Data Size: 1.0
  • Epoch Runtime: 12.3354
  • Bleu: 0.9452

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 30.1871 0 1.6414 0.0063
No log 1 70 29.9596 0.0078 2.6833 0.0049
No log 2 140 29.1372 0.0156 2.1655 0.0065
No log 3 210 26.7750 0.0312 2.5830 0.0081
No log 4 280 24.2698 0.0625 2.8853 0.0073
No log 5 350 22.3393 0.125 3.7455 0.0077
No log 6 420 16.7950 0.25 4.8519 0.0108
3.2519 7 490 12.6226 0.5 7.1730 0.0119
13.1623 8.0 560 8.4490 1.0 12.1719 0.0202
9.943 9.0 630 6.1646 1.0 12.3237 0.0197
6.4624 10.0 700 4.4943 1.0 11.2531 0.0547
5.5913 11.0 770 3.9076 1.0 12.0875 0.1409
5.1688 12.0 840 3.7176 1.0 11.7308 0.1435
4.7396 13.0 910 3.6146 1.0 12.0834 0.1751
4.5913 14.0 980 3.5379 1.0 11.8320 0.3039
4.4126 15.0 1050 3.4947 1.0 11.8071 0.3769
4.3419 16.0 1120 3.4415 1.0 11.8549 0.4286
4.2584 17.0 1190 3.4153 1.0 11.9149 0.4475
4.1421 18.0 1260 3.3831 1.0 12.4453 0.4651
4.1184 19.0 1330 3.3586 1.0 11.5380 0.5262
4.0221 20.0 1400 3.3399 1.0 11.8791 0.5627
3.9764 21.0 1470 3.3146 1.0 13.1879 0.5702
3.9722 22.0 1540 3.2963 1.0 12.2334 0.6373
3.8672 23.0 1610 3.2737 1.0 12.3612 0.6230
3.8153 24.0 1680 3.2606 1.0 12.2318 0.6282
3.8111 25.0 1750 3.2451 1.0 12.7453 0.6573
3.7494 26.0 1820 3.2336 1.0 13.0331 0.6433
3.7313 27.0 1890 3.2227 1.0 11.5825 0.7294
3.6749 28.0 1960 3.2139 1.0 11.4649 0.7157
3.665 29.0 2030 3.2049 1.0 11.5801 0.6979
3.6267 30.0 2100 3.1945 1.0 11.7894 0.7269
3.5956 31.0 2170 3.1884 1.0 12.8793 0.7162
3.5529 32.0 2240 3.1804 1.0 12.3465 0.7233
3.5436 33.0 2310 3.1742 1.0 12.6375 0.7587
3.5025 34.0 2380 3.1637 1.0 12.9137 0.8146
3.469 35.0 2450 3.1614 1.0 13.4902 0.8093
3.4704 36.0 2520 3.1543 1.0 11.5728 0.7949
3.4422 37.0 2590 3.1511 1.0 12.1080 0.8058
3.4156 38.0 2660 3.1463 1.0 11.9007 0.8258
3.3779 39.0 2730 3.1411 1.0 12.0938 0.8863
3.3928 40.0 2800 3.1357 1.0 12.2520 0.8939
3.3517 41.0 2870 3.1311 1.0 12.6665 0.8632
3.3405 42.0 2940 3.1254 1.0 13.1554 0.9187
3.3058 43.0 3010 3.1275 1.0 12.7304 0.8878
3.2953 44.0 3080 3.1200 1.0 14.0716 0.9595
3.281 45.0 3150 3.1176 1.0 11.2074 0.9117
3.2565 46.0 3220 3.1164 1.0 11.6439 0.9633
3.2368 47.0 3290 3.1132 1.0 11.6638 0.9907
3.2169 48.0 3360 3.1145 1.0 12.0850 0.9506
3.1816 49.0 3430 3.1111 1.0 12.1911 0.9197
3.1793 50.0 3500 3.1074 1.0 12.3354 0.9452

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/8d80c4225137775af0858becea32124c

Base model

google/mt5-small
Finetuned
(600)
this model

Evaluation results