contemmcm's picture
End of training
b96e560 verified
metadata
library_name: transformers
license: apache-2.0
base_model: google/umt5-small
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: 1bd9cceed2dcd10f2ece1070a2e20a3c
    results: []

1bd9cceed2dcd10f2ece1070a2e20a3c

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [es-it] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8072
  • Data Size: 1.0
  • Epoch Runtime: 112.5041
  • Bleu: 4.3082

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.6425 0 10.0867 0.2928
No log 1 721 16.3540 0.0078 10.9510 0.2893
No log 2 1442 14.6128 0.0156 11.9953 0.3034
0.3458 3 2163 12.9897 0.0312 14.2539 0.3222
1.0187 4 2884 9.4280 0.0625 17.0721 0.3791
9.7765 5 3605 6.0200 0.125 23.5481 0.4979
6.4556 6 4326 4.5495 0.25 35.6383 1.6521
5.2523 7 5047 3.9789 0.5 61.2487 1.6179
4.508 8.0 5768 3.5314 1.0 112.2478 2.0289
4.1967 9.0 6489 3.3986 1.0 112.5871 2.3105
4.052 10.0 7210 3.3132 1.0 113.8064 2.5055
3.8898 11.0 7931 3.2623 1.0 112.9548 2.6756
3.8333 12.0 8652 3.2131 1.0 113.5531 2.7829
3.7702 13.0 9373 3.1821 1.0 113.0508 2.8880
3.6635 14.0 10094 3.1422 1.0 113.4677 3.0043
3.6578 15.0 10815 3.1133 1.0 113.1681 3.0899
3.5582 16.0 11536 3.0999 1.0 113.1700 3.1533
3.5449 17.0 12257 3.0735 1.0 114.2252 3.2124
3.5093 18.0 12978 3.0548 1.0 112.8411 3.2856
3.4384 19.0 13699 3.0419 1.0 113.2164 3.3314
3.4229 20.0 14420 3.0157 1.0 113.5167 3.3987
3.4119 21.0 15141 3.0014 1.0 113.0884 3.4310
3.3609 22.0 15862 2.9874 1.0 113.3006 3.5151
3.2723 23.0 16583 2.9811 1.0 114.8710 3.5543
3.2748 24.0 17304 2.9645 1.0 114.0400 3.6138
3.2806 25.0 18025 2.9625 1.0 113.1700 3.6308
3.2696 26.0 18746 2.9382 1.0 113.3355 3.6929
3.2254 27.0 19467 2.9330 1.0 112.4022 3.6982
3.2108 28.0 20188 2.9252 1.0 113.1494 3.7675
3.1536 29.0 20909 2.9150 1.0 113.0551 3.8057
3.1271 30.0 21630 2.9039 1.0 113.0676 3.8281
3.1324 31.0 22351 2.9001 1.0 113.4059 3.8688
3.1245 32.0 23072 2.8917 1.0 114.1657 3.9119
3.0853 33.0 23793 2.8821 1.0 113.8688 3.9384
3.025 34.0 24514 2.8809 1.0 113.4756 3.9585
3.0303 35.0 25235 2.8723 1.0 112.4681 3.9852
3.0046 36.0 25956 2.8594 1.0 113.2854 3.9970
2.9943 37.0 26677 2.8579 1.0 113.8893 4.0160
2.9874 38.0 27398 2.8528 1.0 112.7151 4.0269
2.9358 39.0 28119 2.8503 1.0 113.6051 4.0450
2.9332 40.0 28840 2.8432 1.0 112.3515 4.0958
2.9513 41.0 29561 2.8370 1.0 113.1157 4.1324
2.9465 42.0 30282 2.8293 1.0 112.8311 4.1777
2.8816 43.0 31003 2.8295 1.0 114.5790 4.1632
2.8867 44.0 31724 2.8162 1.0 114.1613 4.1918
2.8684 45.0 32445 2.8202 1.0 112.4990 4.2068
2.8588 46.0 33166 2.8130 1.0 113.9478 4.2499
2.817 47.0 33887 2.8068 1.0 113.8720 4.2553
2.8057 48.0 34608 2.8122 1.0 112.9857 4.2949
2.8197 49.0 35329 2.8053 1.0 112.6193 4.3000
2.8217 50.0 36050 2.8072 1.0 112.5041 4.3082

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1