10675fdf2ffcc008846278e8d48333a2

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fi-pl] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.4986
  • Data Size: 1.0
  • Epoch Runtime: 13.5887
  • Bleu: 1.2326

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 17.9148 0 1.7136 0.2824
No log 1 70 17.9378 0.0078 2.0078 0.2747
No log 2 140 17.7525 0.0156 2.3068 0.2499
No log 3 210 17.2373 0.0312 2.5152 0.3278
No log 4 280 16.6606 0.0625 3.0234 0.2880
No log 5 350 15.7679 0.125 4.0189 0.3841
No log 6 420 13.5984 0.25 5.4762 0.3811
2.6579 7 490 11.1682 0.5 7.9828 0.4340
11.6973 8.0 560 7.9426 1.0 14.0710 0.3751
9.4107 9.0 630 5.9708 1.0 13.4130 0.3371
7.1475 10.0 700 5.1801 1.0 11.9684 0.5429
6.6668 11.0 770 4.9393 1.0 13.1774 0.4793
6.3727 12.0 840 4.7408 1.0 12.4711 0.6074
5.8705 13.0 910 4.5561 1.0 12.3122 0.6628
5.6557 14.0 980 4.4244 1.0 12.2256 0.8592
5.3732 15.0 1050 4.2971 1.0 12.4446 0.7637
5.2184 16.0 1120 4.1745 1.0 12.6563 0.4102
5.0985 17.0 1190 4.0587 1.0 13.1597 0.4544
4.9163 18.0 1260 3.9604 1.0 12.1062 0.6604
4.8539 19.0 1330 3.9035 1.0 12.4323 0.6412
4.7189 20.0 1400 3.8508 1.0 12.7998 0.7608
4.6147 21.0 1470 3.8001 1.0 13.0418 0.7460
4.6074 22.0 1540 3.7722 1.0 13.3161 0.7507
4.4924 23.0 1610 3.7449 1.0 13.2591 0.7690
4.4508 24.0 1680 3.7120 1.0 13.6368 0.7716
4.3653 25.0 1750 3.6865 1.0 14.0664 0.8044
4.3382 26.0 1820 3.6724 1.0 12.4489 0.7833
4.3181 27.0 1890 3.6544 1.0 12.5460 0.7901
4.2189 28.0 1960 3.6379 1.0 12.9568 0.8633
4.1794 29.0 2030 3.6218 1.0 12.9708 0.9159
4.1435 30.0 2100 3.6093 1.0 13.7295 0.9259
4.1034 31.0 2170 3.6055 1.0 13.3488 0.8953
4.1007 32.0 2240 3.5849 1.0 13.7861 0.9364
4.0429 33.0 2310 3.5790 1.0 14.1256 0.9593
3.9885 34.0 2380 3.5701 1.0 14.4190 0.9911
3.9478 35.0 2450 3.5599 1.0 12.3155 1.0229
3.9475 36.0 2520 3.5523 1.0 12.7013 1.0155
3.9098 37.0 2590 3.5454 1.0 12.9939 1.0507
3.8731 38.0 2660 3.5432 1.0 13.0388 1.0253
3.8486 39.0 2730 3.5382 1.0 13.2780 1.0464
3.8009 40.0 2800 3.5336 1.0 13.4696 1.0871
3.7859 41.0 2870 3.5298 1.0 13.9548 1.1019
3.8035 42.0 2940 3.5214 1.0 13.4818 1.1478
3.7408 43.0 3010 3.5181 1.0 14.0851 1.1370
3.7268 44.0 3080 3.5147 1.0 12.4496 1.1332
3.6811 45.0 3150 3.5205 1.0 12.6512 1.1282
3.6578 46.0 3220 3.5103 1.0 12.9433 1.1536
3.6635 47.0 3290 3.5073 1.0 13.0345 1.1905
3.6293 48.0 3360 3.5117 1.0 13.1041 1.1758
3.6062 49.0 3430 3.5035 1.0 12.8431 1.1830
3.6127 50.0 3500 3.4986 1.0 13.5887 1.2326

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/10675fdf2ffcc008846278e8d48333a2

Base model

google/umt5-small
Finetuned
(45)
this model