119d40fde8cc3c24912b601104f9dd6b

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fr-no] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.2158
  • Data Size: 1.0
  • Epoch Runtime: 15.7291
  • Bleu: 3.7429

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 20.2441 0 1.8601 0.0704
No log 1 86 20.4134 0.0078 1.9973 0.0483
No log 2 172 20.2769 0.0156 3.0826 0.0462
No log 3 258 20.0506 0.0312 3.5037 0.0504
No log 4 344 19.3243 0.0625 4.1233 0.0462
1.0975 5 430 18.2588 0.125 4.8835 0.0467
4.6037 6 516 15.7345 0.25 6.3307 0.0446
5.5027 7 602 11.2304 0.5 9.4857 0.0542
7.0204 8.0 688 7.5414 1.0 15.6707 0.1923
8.4824 9.0 774 6.1525 1.0 14.3672 0.1869
6.9753 10.0 860 5.0391 1.0 14.2547 0.8677
6.4922 11.0 946 4.7010 1.0 14.1415 0.7576
5.8641 12.0 1032 4.4488 1.0 14.3490 0.8668
5.4654 13.0 1118 4.2347 1.0 14.3414 1.1044
5.1539 14.0 1204 4.0326 1.0 14.9090 1.4052
5.0033 15.0 1290 3.8507 1.0 14.8050 1.5826
4.7892 16.0 1376 3.7268 1.0 14.3627 1.7526
4.6346 17.0 1462 3.6404 1.0 14.7356 1.9106
4.5431 18.0 1548 3.5882 1.0 15.3488 2.1228
4.4212 19.0 1634 3.5567 1.0 15.1713 2.3188
4.3455 20.0 1720 3.5244 1.0 15.3461 2.4114
4.2834 21.0 1806 3.4975 1.0 15.7892 2.4112
4.2198 22.0 1892 3.4743 1.0 16.6737 2.4421
4.123 23.0 1978 3.4423 1.0 14.5820 2.6232
4.0733 24.0 2064 3.4133 1.0 14.3646 2.7541
3.9988 25.0 2150 3.3990 1.0 14.7306 2.8319
3.9794 26.0 2236 3.3789 1.0 15.2129 2.8298
3.9185 27.0 2322 3.3705 1.0 15.3848 2.8578
3.8666 28.0 2408 3.3515 1.0 15.2893 2.9725
3.8388 29.0 2494 3.3385 1.0 15.2697 3.0124
3.8084 30.0 2580 3.3300 1.0 15.5868 3.0930
3.7702 31.0 2666 3.3205 1.0 14.2415 3.0933
3.7115 32.0 2752 3.3123 1.0 14.3792 3.1280
3.6728 33.0 2838 3.3053 1.0 14.4939 3.1815
3.6494 34.0 2924 3.3076 1.0 14.8109 3.1974
3.6546 35.0 3010 3.2860 1.0 15.6599 3.2049
3.6139 36.0 3096 3.2780 1.0 15.4160 3.3345
3.5774 37.0 3182 3.2705 1.0 15.2976 3.3522
3.5437 38.0 3268 3.2703 1.0 15.9273 3.3657
3.4861 39.0 3354 3.2604 1.0 14.6697 3.3889
3.478 40.0 3440 3.2512 1.0 14.7191 3.3770
3.4634 41.0 3526 3.2473 1.0 15.4507 3.4208
3.4544 42.0 3612 3.2548 1.0 15.4143 3.5544
3.418 43.0 3698 3.2395 1.0 15.8460 3.5035
3.4076 44.0 3784 3.2340 1.0 15.6204 3.5561
3.3806 45.0 3870 3.2334 1.0 15.7972 3.6003
3.3351 46.0 3956 3.2240 1.0 14.5148 3.6058
3.3344 47.0 4042 3.2176 1.0 15.0169 3.5453
3.2764 48.0 4128 3.2136 1.0 15.1349 3.5881
3.2828 49.0 4214 3.2134 1.0 15.5410 3.6484
3.2482 50.0 4300 3.2158 1.0 15.7291 3.7429

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/119d40fde8cc3c24912b601104f9dd6b

Base model

google/umt5-small
Finetuned
(45)
this model