94bb060ec0754e3f06a89d47aaf32471

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-ru] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1341
  • Data Size: 1.0
  • Epoch Runtime: 69.4062
  • Bleu: 7.1151

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.8428 0 6.3346 0.3707
No log 1 434 16.1425 0.0078 6.9476 0.4614
No log 2 868 15.2326 0.0156 7.8733 0.4247
No log 3 1302 13.7919 0.0312 8.9728 0.4687
No log 4 1736 10.7401 0.0625 10.9514 0.4977
0.6279 5 2170 7.5709 0.125 14.9074 0.7105
8.0895 6 2604 5.1063 0.25 22.7906 1.1096
5.6065 7 3038 3.9826 0.5 37.4453 3.3622
4.4646 8.0 3472 3.1978 1.0 68.6496 2.5886
3.969 9.0 3906 2.9115 1.0 68.9235 3.4788
3.7369 10.0 4340 2.7801 1.0 70.0600 3.9344
3.58 11.0 4774 2.6955 1.0 68.3549 4.2262
3.4193 12.0 5208 2.6333 1.0 68.8951 4.5022
3.3395 13.0 5642 2.5834 1.0 68.6252 4.6858
3.2362 14.0 6076 2.5427 1.0 68.5016 4.8206
3.1812 15.0 6510 2.5083 1.0 68.8764 4.9863
3.1333 16.0 6944 2.4766 1.0 69.7473 5.1495
3.0722 17.0 7378 2.4409 1.0 69.0182 5.2746
3.0225 18.0 7812 2.4249 1.0 69.4311 5.3718
2.9148 19.0 8246 2.3973 1.0 69.1320 5.5334
2.9038 20.0 8680 2.3811 1.0 69.6505 5.6109
2.8683 21.0 9114 2.3615 1.0 69.2724 5.7119
2.8021 22.0 9548 2.3485 1.0 69.1170 5.8032
2.7796 23.0 9982 2.3283 1.0 70.3405 5.8634
2.7284 24.0 10416 2.3230 1.0 70.3135 5.9333
2.7251 25.0 10850 2.3071 1.0 70.4569 5.9683
2.6767 26.0 11284 2.2925 1.0 70.1294 6.0770
2.6636 27.0 11718 2.2789 1.0 70.4020 6.1429
2.5996 28.0 12152 2.2679 1.0 70.1688 6.1850
2.5917 29.0 12586 2.2601 1.0 70.4845 6.2361
2.5747 30.0 13020 2.2475 1.0 69.9181 6.2676
2.535 31.0 13454 2.2421 1.0 69.6476 6.3524
2.5088 32.0 13888 2.2290 1.0 69.9331 6.4197
2.5258 33.0 14322 2.2253 1.0 69.0732 6.4792
2.4744 34.0 14756 2.2185 1.0 69.4931 6.5395
2.4417 35.0 15190 2.2074 1.0 69.7491 6.5594
2.4442 36.0 15624 2.2061 1.0 69.7999 6.6204
2.4052 37.0 16058 2.1952 1.0 68.9810 6.7009
2.4058 38.0 16492 2.1866 1.0 69.3795 6.7066
2.3367 39.0 16926 2.1838 1.0 69.1257 6.7676
2.317 40.0 17360 2.1768 1.0 70.2986 6.8029
2.3178 41.0 17794 2.1735 1.0 69.7982 6.8361
2.2817 42.0 18228 2.1646 1.0 71.4308 6.8576
2.2973 43.0 18662 2.1581 1.0 70.2101 6.9236
2.2728 44.0 19096 2.1546 1.0 72.9310 6.9524
2.2672 45.0 19530 2.1521 1.0 69.0454 7.0105
2.2194 46.0 19964 2.1533 1.0 73.6460 7.0428
2.1964 47.0 20398 2.1439 1.0 70.7197 7.0679
2.2149 48.0 20832 2.1327 1.0 72.6235 7.1224
2.2026 49.0 21266 2.1374 1.0 68.5771 7.1457
2.1986 50.0 21700 2.1341 1.0 69.4062 7.1151

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/94bb060ec0754e3f06a89d47aaf32471

Base model

google/umt5-small
Finetuned
(45)
this model