5e2c020caea3631df1448f618c055624

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fr-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0568
  • Data Size: 1.0
  • Epoch Runtime: 151.4874
  • Bleu: 9.0567

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.0469 0 13.5759 0.2792
No log 1 1000 15.5384 0.0078 15.7546 0.2854
No log 2 2000 14.2046 0.0156 15.9420 0.3117
No log 3 3000 11.0596 0.0312 18.4288 0.3507
0.5046 4 4000 7.2043 0.0625 23.8459 0.3804
7.5837 5 5000 5.0983 0.125 31.3535 0.9019
0.354 6 6000 4.1686 0.25 48.7612 3.0741
0.4226 7 7000 3.5179 0.5 84.0710 2.5292
4.0056 8.0 8000 3.0932 1.0 154.1716 3.7938
3.7016 9.0 9000 2.9239 1.0 154.7242 4.4936
3.5164 10.0 10000 2.8166 1.0 154.9541 4.9045
3.3913 11.0 11000 2.7394 1.0 154.4091 5.2539
3.2694 12.0 12000 2.6767 1.0 154.1653 5.5500
3.1879 13.0 13000 2.6226 1.0 154.7862 5.8114
3.1059 14.0 14000 2.5770 1.0 152.5055 6.0134
3.0226 15.0 15000 2.5310 1.0 152.2572 6.2154
2.9793 16.0 16000 2.5003 1.0 152.6136 6.4185
2.8864 17.0 17000 2.4742 1.0 152.7497 6.5705
2.8905 18.0 18000 2.4377 1.0 153.1263 6.7235
2.8353 19.0 19000 2.4112 1.0 153.0815 6.8518
2.7771 20.0 20000 2.3906 1.0 152.9901 7.0136
2.7758 21.0 21000 2.3625 1.0 153.3342 7.1366
2.7155 22.0 22000 2.3436 1.0 151.8810 7.2288
2.6817 23.0 23000 2.3251 1.0 151.9600 7.3268
2.636 24.0 24000 2.3105 1.0 153.9415 7.4365
2.6669 25.0 25000 2.2921 1.0 153.0338 7.5221
2.5816 26.0 26000 2.2731 1.0 151.8625 7.6526
2.5615 27.0 27000 2.2599 1.0 151.9675 7.7261
2.4947 28.0 28000 2.2530 1.0 151.5303 7.8470
2.5166 29.0 29000 2.2336 1.0 153.0677 7.9289
2.4557 30.0 30000 2.2256 1.0 152.1012 8.0204
2.474 31.0 31000 2.2065 1.0 154.6577 8.0815
2.4297 32.0 32000 2.1982 1.0 153.2791 8.1406
2.3823 33.0 33000 2.1887 1.0 151.7204 8.2139
2.3719 34.0 34000 2.1878 1.0 153.3640 8.2638
2.3598 35.0 35000 2.1707 1.0 152.8024 8.3561
2.3231 36.0 36000 2.1591 1.0 154.2330 8.4129
2.3012 37.0 37000 2.1555 1.0 153.5931 8.4714
2.317 38.0 38000 2.1326 1.0 153.1331 8.5374
2.2725 39.0 39000 2.1347 1.0 152.3129 8.5907
2.2476 40.0 40000 2.1373 1.0 156.9343 8.6044
2.2446 41.0 41000 2.1207 1.0 156.4713 8.6799
2.2218 42.0 42000 2.1083 1.0 152.9422 8.7118
2.2042 43.0 43000 2.1066 1.0 154.8949 8.7772
2.2087 44.0 44000 2.1007 1.0 153.7430 8.7890
2.1733 45.0 45000 2.0951 1.0 153.8691 8.8444
2.193 46.0 46000 2.0931 1.0 151.8176 8.8520
2.1578 47.0 47000 2.0772 1.0 152.4861 8.9144
2.1741 48.0 48000 2.0803 1.0 152.9961 8.9398
2.1133 49.0 49000 2.0752 1.0 152.8284 9.0189
2.0837 50.0 50000 2.0568 1.0 151.4874 9.0567

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
3
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/5e2c020caea3631df1448f618c055624

Finetuned
(45)
this model