c591ac3fb982261b29782cd25fe3c5a2

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [es-fr] dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7078
  • Data Size: 1.0
  • Epoch Runtime: 216.9740
  • Bleu: 12.6009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 14.9661 0 18.4328 0.1698
No log 1 1407 13.8678 0.0078 21.2159 0.1976
No log 2 2814 12.1745 0.0156 21.7496 0.1463
0.3826 3 4221 10.0626 0.0312 25.6574 0.1737
9.4091 4 5628 6.0042 0.0625 31.2974 0.4278
5.7989 5 7035 3.7447 0.125 43.6643 4.9009
4.5277 6 8442 3.1458 0.25 69.2407 3.8997
3.7289 7 9849 2.6678 0.5 117.4917 5.3978
3.2112 8.0 11256 2.4129 1.0 214.7005 6.6841
2.9436 9.0 12663 2.2953 1.0 216.1568 7.5042
2.8647 10.0 14070 2.2094 1.0 213.5292 8.1059
2.6919 11.0 15477 2.1563 1.0 212.4440 8.6020
2.6117 12.0 16884 2.1075 1.0 214.2269 8.9490
2.5448 13.0 18291 2.0759 1.0 213.5605 9.2504
2.5115 14.0 19698 2.0463 1.0 214.8060 9.4776
2.4257 15.0 21105 2.0095 1.0 214.7406 9.7150
2.3938 16.0 22512 1.9898 1.0 215.9306 9.8699
2.3467 17.0 23919 1.9705 1.0 215.7179 10.0840
2.298 18.0 25326 1.9575 1.0 215.5572 10.2474
2.2712 19.0 26733 1.9382 1.0 213.5135 10.3838
2.2264 20.0 28140 1.9135 1.0 214.5938 10.5263
2.1897 21.0 29547 1.8935 1.0 216.1186 10.6721
2.167 22.0 30954 1.8883 1.0 212.1273 10.7909
2.1503 23.0 32361 1.8746 1.0 217.0957 10.8991
2.0907 24.0 33768 1.8560 1.0 215.7370 11.0596
2.1052 25.0 35175 1.8475 1.0 216.2399 11.1471
2.0652 26.0 36582 1.8431 1.0 214.3941 11.2386
2.0244 27.0 37989 1.8248 1.0 216.7905 11.3224
2.0077 28.0 39396 1.8150 1.0 215.1501 11.4013
2.0417 29.0 40803 1.8087 1.0 214.4889 11.4483
2.008 30.0 42210 1.8023 1.0 213.1100 11.5161
1.9606 31.0 43617 1.7926 1.0 214.1516 11.6453
1.9298 32.0 45024 1.7977 1.0 214.0810 11.6786
1.938 33.0 46431 1.7829 1.0 223.3107 11.7729
1.8941 34.0 47838 1.7701 1.0 223.4652 11.8105
1.9073 35.0 49245 1.7751 1.0 221.6524 11.8768
1.8586 36.0 50652 1.7664 1.0 223.5310 11.9486
1.8678 37.0 52059 1.7554 1.0 217.4213 12.0003
1.8227 38.0 53466 1.7456 1.0 217.4962 12.0672
1.7791 39.0 54873 1.7452 1.0 220.3917 12.0985
1.8189 40.0 56280 1.7439 1.0 217.9907 12.1608
1.8328 41.0 57687 1.7462 1.0 217.8165 12.2203
1.8378 42.0 59094 1.7370 1.0 217.3545 12.3155
1.7768 43.0 60501 1.7386 1.0 219.9349 12.3127
1.752 44.0 61908 1.7253 1.0 219.1841 12.3925
1.7366 45.0 63315 1.7222 1.0 216.8871 12.4186
1.6971 46.0 64722 1.7219 1.0 216.8452 12.4796
1.7429 47.0 66129 1.7136 1.0 219.9515 12.5299
1.6928 48.0 67536 1.7111 1.0 217.4749 12.5543
1.6711 49.0 68943 1.7052 1.0 218.2828 12.5817
1.6935 50.0 70350 1.7078 1.0 216.9740 12.6009

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/c591ac3fb982261b29782cd25fe3c5a2

Base model

google/umt5-small
Finetuned
(45)
this model