3c58d0bab7c50f6e8dfc30ee52147218

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.6168
  • Data Size: 1.0
  • Epoch Runtime: 62.5072
  • Bleu: 6.9745

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 15.6686 0 5.8408 0.2469
No log 1 390 15.3559 0.0078 6.4347 0.2206
No log 2 780 14.3436 0.0156 7.0920 0.2455
No log 3 1170 13.0057 0.0312 8.1606 0.3001
No log 4 1560 11.1104 0.0625 10.1428 0.3368
0.8431 5 1950 8.5109 0.125 13.8434 0.4086
1.4743 6 2340 5.8619 0.25 21.4355 0.4794
6.3219 7 2730 4.3964 0.5 33.9841 3.2816
4.9917 8.0 3120 3.7946 1.0 61.1269 2.4625
4.4862 9.0 3510 3.4671 1.0 60.8731 3.3469
4.2146 10.0 3900 3.3099 1.0 59.8231 3.8854
4.0429 11.0 4290 3.2172 1.0 60.1998 4.2481
3.913 12.0 4680 3.1442 1.0 61.5796 4.4836
3.8373 13.0 5070 3.0960 1.0 61.2461 4.6702
3.7399 14.0 5460 3.0555 1.0 60.0812 4.8506
3.6047 15.0 5850 3.0130 1.0 60.4809 5.0004
3.5615 16.0 6240 2.9855 1.0 59.5285 5.1945
3.5496 17.0 6630 2.9604 1.0 59.1054 5.2738
3.4475 18.0 7020 2.9298 1.0 60.1827 5.4059
3.3998 19.0 7410 2.9004 1.0 59.8943 5.4838
3.3728 20.0 7800 2.8899 1.0 61.1778 5.5841
3.3449 21.0 8190 2.8693 1.0 61.2387 5.6640
3.2885 22.0 8580 2.8398 1.0 61.3963 5.7439
3.2772 23.0 8970 2.8316 1.0 59.5462 5.8515
3.2107 24.0 9360 2.8136 1.0 59.9638 5.8933
3.1825 25.0 9750 2.8082 1.0 61.0646 5.9557
3.1763 26.0 10140 2.7892 1.0 59.8982 6.0271
3.1125 27.0 10530 2.7771 1.0 60.4019 6.0954
3.1258 28.0 10920 2.7660 1.0 59.3301 6.1228
3.0994 29.0 11310 2.7471 1.0 60.0506 6.1887
3.0527 30.0 11700 2.7310 1.0 60.1198 6.2310
2.9866 31.0 12090 2.7313 1.0 60.4529 6.2855
2.9615 32.0 12480 2.7176 1.0 61.4655 6.3646
2.9527 33.0 12870 2.7145 1.0 60.0622 6.4026
2.9215 34.0 13260 2.7037 1.0 60.4421 6.4369
2.9249 35.0 13650 2.7005 1.0 62.3335 6.4936
2.9285 36.0 14040 2.6947 1.0 62.6724 6.4983
2.8703 37.0 14430 2.6830 1.0 62.9112 6.5631
2.8218 38.0 14820 2.6731 1.0 62.3739 6.6175
2.8165 39.0 15210 2.6701 1.0 63.4032 6.6356
2.7918 40.0 15600 2.6651 1.0 64.0902 6.6879
2.7974 41.0 15990 2.6552 1.0 62.0214 6.7311
2.802 42.0 16380 2.6527 1.0 62.3090 6.7668
2.7434 43.0 16770 2.6454 1.0 62.3300 6.7609
2.7407 44.0 17160 2.6422 1.0 63.2888 6.8010
2.724 45.0 17550 2.6369 1.0 62.5514 6.8637
2.7156 46.0 17940 2.6350 1.0 63.5730 6.8781
2.6456 47.0 18330 2.6227 1.0 63.4639 6.9180
2.7103 48.0 18720 2.6244 1.0 62.5968 6.9265
2.646 49.0 19110 2.6157 1.0 63.1561 6.9574
2.6257 50.0 19500 2.6168 1.0 62.5072 6.9745

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/3c58d0bab7c50f6e8dfc30ee52147218

Finetuned
(45)
this model