ea498f2f8800ae82b2b80e3f3e8bf7e9

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fr-no] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8706
  • Data Size: 1.0
  • Epoch Runtime: 13.7158
  • Bleu: 2.8681

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 24.6813 0 1.7986 0.0009
No log 1 86 24.7378 0.0078 2.8486 0.0008
No log 2 172 24.3812 0.0156 2.3142 0.0008
No log 3 258 23.0589 0.0312 2.7868 0.0014
No log 4 344 20.8463 0.0625 3.5438 0.0015
1.3681 5 430 18.1595 0.125 4.3812 0.0027
5.2509 6 516 15.6510 0.25 6.1068 0.0020
5.5813 7 602 10.3997 0.5 8.8942 0.0033
6.4297 8.0 688 5.4761 1.0 15.3246 0.0122
6.5951 9.0 774 4.0052 1.0 13.4675 0.2429
5.0986 10.0 860 3.6742 1.0 14.4898 0.4658
4.841 11.0 946 3.5062 1.0 14.6042 0.6311
4.4852 12.0 1032 3.4093 1.0 14.1982 0.7627
4.2868 13.0 1118 3.3364 1.0 14.5429 1.0343
4.138 14.0 1204 3.2795 1.0 14.1576 1.2543
4.0571 15.0 1290 3.2347 1.0 14.0612 1.3775
3.9666 16.0 1376 3.2016 1.0 13.3305 1.4498
3.8844 17.0 1462 3.1669 1.0 13.7788 1.5649
3.8281 18.0 1548 3.1439 1.0 14.5092 1.6304
3.7532 19.0 1634 3.1191 1.0 14.4961 1.6749
3.7084 20.0 1720 3.0959 1.0 14.8120 1.8000
3.6705 21.0 1806 3.0784 1.0 14.5898 1.8761
3.6522 22.0 1892 3.0642 1.0 14.6089 1.8872
3.5636 23.0 1978 3.0461 1.0 13.4842 1.9836
3.551 24.0 2064 3.0356 1.0 13.7628 2.0694
3.4916 25.0 2150 3.0229 1.0 14.1647 2.0876
3.4554 26.0 2236 3.0042 1.0 14.4404 2.1515
3.4224 27.0 2322 2.9986 1.0 15.4571 2.1357
3.3942 28.0 2408 2.9844 1.0 15.4711 2.1600
3.3785 29.0 2494 2.9769 1.0 15.4662 2.2534
3.3361 30.0 2580 2.9689 1.0 15.7904 2.2643
3.3061 31.0 2666 2.9625 1.0 14.0855 2.2744
3.2623 32.0 2752 2.9541 1.0 13.9531 2.3525
3.2314 33.0 2838 2.9447 1.0 13.9750 2.3598
3.2346 34.0 2924 2.9440 1.0 14.0985 2.4240
3.2265 35.0 3010 2.9302 1.0 14.2094 2.4658
3.1866 36.0 3096 2.9258 1.0 13.8554 2.4391
3.1598 37.0 3182 2.9187 1.0 14.1753 2.4766
3.1336 38.0 3268 2.9156 1.0 14.1022 2.4855
3.0847 39.0 3354 2.9069 1.0 13.5672 2.5021
3.0866 40.0 3440 2.9001 1.0 13.4121 2.5222
3.0738 41.0 3526 2.8992 1.0 13.9182 2.5807
3.0578 42.0 3612 2.8924 1.0 14.3426 2.6003
3.0272 43.0 3698 2.8916 1.0 15.0226 2.6529
3.0087 44.0 3784 2.8856 1.0 15.5647 2.7099
2.9882 45.0 3870 2.8847 1.0 15.5668 2.7470
2.9678 46.0 3956 2.8827 1.0 15.8990 2.7260
2.9595 47.0 4042 2.8777 1.0 13.5686 2.7856
2.9079 48.0 4128 2.8710 1.0 13.6765 2.8488
2.9165 49.0 4214 2.8716 1.0 13.7148 2.8691
2.8845 50.0 4300 2.8706 1.0 13.7158 2.8681

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ea498f2f8800ae82b2b80e3f3e8bf7e9

Base model

google/mt5-small
Finetuned
(669)
this model