53206cd9f8a5c89a5c063d2df26e6587

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [de-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4075
  • Data Size: 1.0
  • Epoch Runtime: 56.8618
  • Bleu: 6.1217

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 26.6082 0 5.3460 0.0037
No log 1 390 24.8507 0.0078 6.9534 0.0039
No log 2 780 21.8076 0.0156 6.6067 0.0060
No log 3 1170 18.9334 0.0312 8.1142 0.0033
No log 4 1560 14.8386 0.0625 9.2352 0.0070
1.0192 5 1950 9.1086 0.125 12.6142 0.0117
1.537 6 2340 5.9527 0.25 19.1072 0.0181
5.3284 7 2730 3.7119 0.5 31.2303 1.1817
4.326 8.0 3120 3.3122 1.0 56.5618 2.1861
3.9947 9.0 3510 3.1591 1.0 56.2062 2.6803
3.8043 10.0 3900 3.0513 1.0 56.8248 3.0562
3.6641 11.0 4290 2.9802 1.0 57.5752 3.4920
3.5628 12.0 4680 2.9200 1.0 57.0053 3.7333
3.5145 13.0 5070 2.8800 1.0 56.2891 3.9421
3.4306 14.0 5460 2.8305 1.0 56.1577 4.1149
3.3156 15.0 5850 2.7955 1.0 56.5411 4.2152
3.2754 16.0 6240 2.7686 1.0 54.6214 4.4542
3.253 17.0 6630 2.7444 1.0 55.0890 4.4858
3.1731 18.0 7020 2.7225 1.0 56.6047 4.5586
3.148 19.0 7410 2.6913 1.0 55.6237 4.6820
3.1067 20.0 7800 2.6790 1.0 55.5254 4.7541
3.0622 21.0 8190 2.6594 1.0 56.0049 4.8190
3.0174 22.0 8580 2.6329 1.0 56.9300 4.8789
3.0019 23.0 8970 2.6250 1.0 55.3320 4.9717
2.9484 24.0 9360 2.6063 1.0 55.1256 5.0273
2.9187 25.0 9750 2.5972 1.0 56.3781 5.0626
2.9095 26.0 10140 2.5853 1.0 56.1055 5.1179
2.8524 27.0 10530 2.5713 1.0 56.4429 5.2100
2.8516 28.0 10920 2.5601 1.0 55.4443 5.2673
2.8325 29.0 11310 2.5430 1.0 56.2263 5.3145
2.8002 30.0 11700 2.5319 1.0 56.3833 5.3782
2.735 31.0 12090 2.5281 1.0 56.4733 5.4505
2.7312 32.0 12480 2.5184 1.0 56.4822 5.4686
2.7051 33.0 12870 2.5092 1.0 56.9100 5.5074
2.6724 34.0 13260 2.5067 1.0 55.9261 5.5596
2.674 35.0 13650 2.4875 1.0 55.5195 5.6071
2.6742 36.0 14040 2.4842 1.0 56.7434 5.6734
2.6162 37.0 14430 2.4774 1.0 56.5541 5.7158
2.581 38.0 14820 2.4693 1.0 55.7454 5.7728
2.5671 39.0 15210 2.4665 1.0 56.4713 5.7896
2.5664 40.0 15600 2.4549 1.0 55.3665 5.8258
2.5588 41.0 15990 2.4533 1.0 56.0915 5.8644
2.545 42.0 16380 2.4522 1.0 56.1284 5.8635
2.5216 43.0 16770 2.4378 1.0 55.3636 5.9064
2.5006 44.0 17160 2.4335 1.0 58.8562 5.9244
2.4887 45.0 17550 2.4291 1.0 58.5788 5.9757
2.4745 46.0 17940 2.4264 1.0 58.2341 6.0034
2.4107 47.0 18330 2.4197 1.0 59.1548 6.0582
2.4954 48.0 18720 2.4161 1.0 58.2660 6.0710
2.4066 49.0 19110 2.4099 1.0 58.7295 6.0844
2.4026 50.0 19500 2.4075 1.0 56.8618 6.1217

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/53206cd9f8a5c89a5c063d2df26e6587

Base model

google/mt5-small
Finetuned
(666)
this model