acbacdfa50c20b11ca4810175f066222

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-es] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2036
  • Data Size: 1.0
  • Epoch Runtime: 369.2799
  • Bleu: 7.6026

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 16.1008 0 30.7191 0.1376
No log 1 2336 15.4111 0.0078 33.9916 0.1382
0.2628 2 4672 11.6741 0.0156 36.9915 0.1409
0.3107 3 7008 7.5072 0.0312 42.3120 0.2862
6.9196 4 9344 4.6997 0.0625 53.1233 1.4737
5.3873 5 11680 3.9288 0.125 73.4840 2.1529
4.487 6 14016 3.4148 0.25 116.9824 2.5966
4.0544 7 16352 3.1414 0.5 199.2163 3.2885
3.6842 8.0 18688 2.9427 1.0 364.7632 4.0592
3.4452 9.0 21024 2.8255 1.0 364.2480 4.5240
3.3093 10.0 23360 2.7579 1.0 363.0636 4.8231
3.2422 11.0 25696 2.7009 1.0 364.7663 5.0897
3.1326 12.0 28032 2.6510 1.0 364.8785 5.3149
3.0436 13.0 30368 2.6195 1.0 362.5315 5.4792
3.0106 14.0 32704 2.5788 1.0 362.0513 5.6407
2.9365 15.0 35040 2.5521 1.0 362.4158 5.7836
2.9626 16.0 37376 2.5179 1.0 362.7865 5.9337
2.8353 17.0 39712 2.4982 1.0 365.3876 6.0555
2.8446 18.0 42048 2.4835 1.0 364.7898 6.1210
2.8111 19.0 44384 2.4641 1.0 371.6591 6.2001
2.7674 20.0 46720 2.4445 1.0 383.2359 6.2968
2.7096 21.0 49056 2.4256 1.0 385.3468 6.3968
2.6922 22.0 51392 2.4156 1.0 385.5504 6.4567
2.652 23.0 53728 2.3956 1.0 384.0397 6.5348
2.5982 24.0 56064 2.3923 1.0 381.7050 6.5862
2.6054 25.0 58400 2.3686 1.0 386.3455 6.6673
2.5553 26.0 60736 2.3560 1.0 383.9699 6.7479
2.5704 27.0 63072 2.3517 1.0 379.6998 6.7864
2.5291 28.0 65408 2.3349 1.0 385.1407 6.8487
2.495 29.0 67744 2.3311 1.0 380.7480 6.8744
2.5026 30.0 70080 2.3220 1.0 367.5770 6.9267
2.4608 31.0 72416 2.3145 1.0 363.9416 6.9729
2.4994 32.0 74752 2.2948 1.0 366.9948 7.0238
2.4301 33.0 77088 2.2908 1.0 366.0870 7.0714
2.4414 34.0 79424 2.2933 1.0 366.1123 7.1139
2.4087 35.0 81760 2.2787 1.0 364.9032 7.1458
2.4174 36.0 84096 2.2691 1.0 364.1023 7.1781
2.3635 37.0 86432 2.2636 1.0 362.4927 7.2447
2.3803 38.0 88768 2.2673 1.0 365.5410 7.2437
2.3771 39.0 91104 2.2526 1.0 366.7453 7.2992
2.365 40.0 93440 2.2541 1.0 364.3133 7.3154
2.339 41.0 95776 2.2431 1.0 366.3519 7.3661
2.3013 42.0 98112 2.2383 1.0 370.7902 7.3864
2.2845 43.0 100448 2.2416 1.0 368.4315 7.4176
2.2383 44.0 102784 2.2310 1.0 365.7991 7.4575
2.2968 45.0 105120 2.2163 1.0 368.7937 7.4876
2.2331 46.0 107456 2.2186 1.0 376.4598 7.5047
2.2592 47.0 109792 2.2192 1.0 368.2389 7.5199
2.2147 48.0 112128 2.2200 1.0 370.3951 7.5408
2.2242 49.0 114464 2.2059 1.0 373.3496 7.5755
2.2101 50.0 116800 2.2036 1.0 369.2799 7.6026

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/acbacdfa50c20b11ca4810175f066222

Finetuned
(45)
this model