ac46bfb05ca97f2c3c0d5d441dc1f320

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fi-no] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.3014
  • Data Size: 1.0
  • Epoch Runtime: 15.5650
  • Bleu: 3.2782

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 20.4846 0 1.8456 0.0471
No log 1 85 20.0629 0.0078 2.0750 0.0410
No log 2 170 19.3595 0.0156 2.6262 0.0541
No log 3 255 18.7857 0.0312 3.0738 0.0694
No log 4 340 17.2868 0.0625 3.7254 0.0387
1.2802 5 425 15.4411 0.125 4.6881 0.0419
1.2802 6 510 13.1572 0.25 6.2857 0.0545
4.8141 7 595 9.9638 0.5 9.6160 0.0748
10.6633 8.0 680 6.9914 1.0 16.4476 0.1602
7.6292 9.0 765 5.3567 1.0 14.2130 0.4702
6.5098 10.0 850 4.8971 1.0 14.5216 1.1747
6.2063 11.0 935 4.6364 1.0 15.0752 1.0439
5.7514 12.0 1020 4.4694 1.0 14.9287 0.9951
5.4377 13.0 1105 4.2971 1.0 15.0105 1.2044
5.309 14.0 1190 4.1205 1.0 15.5370 2.0124
5.0263 15.0 1275 3.9475 1.0 14.3123 1.3760
4.8406 16.0 1360 3.8006 1.0 13.9997 1.6711
4.7169 17.0 1445 3.7074 1.0 14.0292 1.9271
4.5842 18.0 1530 3.6490 1.0 14.2387 1.9551
4.4437 19.0 1615 3.6177 1.0 15.4691 2.1789
4.3329 20.0 1700 3.5798 1.0 15.1585 2.2025
4.2839 21.0 1785 3.5639 1.0 15.2358 2.2920
4.233 22.0 1870 3.5289 1.0 14.3394 2.3262
4.1427 23.0 1955 3.5152 1.0 14.6519 2.3887
4.1117 24.0 2040 3.4927 1.0 14.9714 2.5290
4.0378 25.0 2125 3.4744 1.0 15.1210 2.6254
4.0233 26.0 2210 3.4548 1.0 15.5285 2.6719
3.9641 27.0 2295 3.4447 1.0 15.9118 2.6654
3.9449 28.0 2380 3.4381 1.0 15.5406 2.6565
3.8673 29.0 2465 3.4221 1.0 15.5655 2.7155
3.8012 30.0 2550 3.4152 1.0 14.2643 2.7333
3.8407 31.0 2635 3.4047 1.0 14.5685 2.7547
3.8055 32.0 2720 3.3925 1.0 14.4025 2.8800
3.7316 33.0 2805 3.3804 1.0 14.7314 2.8752
3.7355 34.0 2890 3.3729 1.0 14.5378 2.9105
3.6632 35.0 2975 3.3657 1.0 15.2909 2.9530
3.6315 36.0 3060 3.3596 1.0 15.1494 2.9893
3.6343 37.0 3145 3.3544 1.0 15.5905 3.1070
3.5968 38.0 3230 3.3444 1.0 14.3635 2.9994
3.6004 39.0 3315 3.3377 1.0 14.2375 3.0890
3.5587 40.0 3400 3.3384 1.0 14.3461 3.0214
3.5362 41.0 3485 3.3334 1.0 14.8068 3.0894
3.4851 42.0 3570 3.3233 1.0 15.2558 3.1220
3.4934 43.0 3655 3.3145 1.0 14.9720 3.1507
3.4576 44.0 3740 3.3182 1.0 15.4616 3.1191
3.4556 45.0 3825 3.3177 1.0 15.9163 3.1289
3.4143 46.0 3910 3.3114 1.0 14.9332 3.1962
3.3592 47.0 3995 3.3054 1.0 14.9497 3.2189
3.3725 48.0 4080 3.2991 1.0 14.9902 3.2019
3.3191 49.0 4165 3.2964 1.0 15.7867 3.1985
3.3174 50.0 4250 3.3014 1.0 15.5650 3.2782

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ac46bfb05ca97f2c3c0d5d441dc1f320

Base model

google/umt5-small
Finetuned
(45)
this model