ddd4a6aa06d6ae42631b8799ef97d92a

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fr-pl] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4500
  • Data Size: 1.0
  • Epoch Runtime: 12.7105
  • Bleu: 1.3591

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 18.5972 0 1.6580 0.3208
No log 1 70 18.2067 0.0078 2.1820 0.2932
No log 2 140 17.7177 0.0156 2.3440 0.3019
No log 3 210 17.1370 0.0312 2.6829 0.2702
No log 4 280 16.4891 0.0625 3.2230 0.2946
No log 5 350 15.3033 0.125 4.1663 0.2479
No log 6 420 13.6401 0.25 5.1414 0.2967
2.6129 7 490 11.3079 0.5 8.0574 0.4094
11.6429 8.0 560 7.8088 1.0 13.5010 0.3318
9.3515 9.0 630 5.9276 1.0 13.4543 0.4188
7.0719 10.0 700 5.0844 1.0 11.6559 0.4961
6.5733 11.0 770 4.8212 1.0 12.9264 0.6565
6.2979 12.0 840 4.6587 1.0 12.6095 0.7886
5.8217 13.0 910 4.4995 1.0 12.7657 0.9692
5.6473 14.0 980 4.3470 1.0 12.7114 1.1158
5.3408 15.0 1050 4.2188 1.0 13.2125 1.2465
5.2398 16.0 1120 4.0965 1.0 13.1490 0.7502
5.0955 17.0 1190 4.0011 1.0 13.5470 0.5956
4.8827 18.0 1260 3.8982 1.0 11.7615 0.8255
4.8067 19.0 1330 3.8328 1.0 12.2196 0.8645
4.6766 20.0 1400 3.7824 1.0 12.1924 0.8885
4.6094 21.0 1470 3.7223 1.0 12.4951 0.9598
4.5736 22.0 1540 3.6905 1.0 12.6724 0.9687
4.4444 23.0 1610 3.6601 1.0 12.8187 0.9986
4.3824 24.0 1680 3.6374 1.0 13.2163 1.0042
4.3578 25.0 1750 3.6156 1.0 13.1140 1.0644
4.2655 26.0 1820 3.5985 1.0 11.9219 1.0492
4.252 27.0 1890 3.5755 1.0 12.3008 1.0693
4.1737 28.0 1960 3.5686 1.0 12.7413 1.0925
4.1461 29.0 2030 3.5534 1.0 12.6865 1.0772
4.0841 30.0 2100 3.5476 1.0 12.8186 1.1322
4.0576 31.0 2170 3.5374 1.0 13.5435 1.2023
4.0192 32.0 2240 3.5235 1.0 13.4343 1.1625
3.9789 33.0 2310 3.5156 1.0 13.1692 1.1780
3.9396 34.0 2380 3.5059 1.0 13.4867 1.1580
3.9005 35.0 2450 3.4989 1.0 12.1223 1.2033
3.8795 36.0 2520 3.4927 1.0 11.9539 1.2470
3.851 37.0 2590 3.4914 1.0 12.0542 1.2446
3.8158 38.0 2660 3.4846 1.0 12.3248 1.1963
3.7645 39.0 2730 3.4799 1.0 12.3383 1.2744
3.7962 40.0 2800 3.4729 1.0 12.5361 1.2342
3.7486 41.0 2870 3.4722 1.0 13.2700 1.2756
3.7166 42.0 2940 3.4666 1.0 12.9179 1.3236
3.685 43.0 3010 3.4647 1.0 12.8274 1.3277
3.6813 44.0 3080 3.4617 1.0 11.8841 1.3205
3.6468 45.0 3150 3.4584 1.0 12.3643 1.3308
3.6138 46.0 3220 3.4559 1.0 12.4019 1.3099
3.5968 47.0 3290 3.4511 1.0 12.3915 1.4182
3.5644 48.0 3360 3.4496 1.0 12.8932 1.4216
3.5602 49.0 3430 3.4485 1.0 12.8400 1.3770
3.5279 50.0 3500 3.4500 1.0 12.7105 1.3591

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ddd4a6aa06d6ae42631b8799ef97d92a

Base model

google/umt5-small
Finetuned
(45)
this model