0b99f2c59efa339bc669ee1ad6404947

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-no] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.8015
  • Data Size: 1.0
  • Epoch Runtime: 16.1127
  • Bleu: 6.5099

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 19.5040 0 1.8164 0.0551
No log 1 87 19.7877 0.0078 2.3806 0.0613
No log 2 174 19.4318 0.0156 2.5573 0.0522
No log 3 261 19.1174 0.0312 2.8387 0.0869
No log 4 348 18.1931 0.0625 3.3181 0.0857
0.8606 5 435 16.8773 0.125 4.0564 0.0827
4.411 6 522 13.8636 0.25 6.1705 0.1425
5.3619 7 609 9.9907 0.5 9.3401 0.1464
6.7979 8.0 696 6.9065 1.0 15.9693 0.2216
8.0446 9.0 783 5.1407 1.0 14.3013 0.7074
6.6046 10.0 870 4.5308 1.0 15.0072 2.4172
5.8458 11.0 957 4.2031 1.0 15.0180 2.5097
5.5477 12.0 1044 3.9590 1.0 15.3057 2.6652
5.1566 13.0 1131 3.7686 1.0 15.2963 3.0976
4.83 14.0 1218 3.5775 1.0 15.1575 3.5723
4.5322 15.0 1305 3.4101 1.0 15.9932 3.9312
4.445 16.0 1392 3.2813 1.0 14.4618 4.3110
4.2768 17.0 1479 3.2138 1.0 15.2863 4.5780
4.1393 18.0 1566 3.1576 1.0 15.1322 4.8100
4.0361 19.0 1653 3.1251 1.0 15.1464 4.9201
3.9382 20.0 1740 3.0968 1.0 15.1369 5.0012
3.84 21.0 1827 3.0731 1.0 15.4416 5.1079
3.8089 22.0 1914 3.0324 1.0 16.1804 5.2143
3.7134 23.0 2001 3.0245 1.0 14.2684 5.4818
3.6262 24.0 2088 2.9995 1.0 14.8129 5.4222
3.6048 25.0 2175 2.9825 1.0 14.7220 5.4213
3.5626 26.0 2262 2.9654 1.0 14.9187 5.5030
3.5273 27.0 2349 2.9595 1.0 15.1866 5.5954
3.4709 28.0 2436 2.9402 1.0 15.7237 5.6550
3.4198 29.0 2523 2.9172 1.0 15.7879 5.7443
3.4119 30.0 2610 2.9093 1.0 14.3025 5.7331
3.3313 31.0 2697 2.8949 1.0 14.2849 5.8333
3.3101 32.0 2784 2.8836 1.0 14.6021 5.9586
3.2747 33.0 2871 2.8879 1.0 14.7945 5.9238
3.2538 34.0 2958 2.8790 1.0 15.1808 6.0059
3.2457 35.0 3045 2.8797 1.0 14.7276 6.0394
3.1796 36.0 3132 2.8599 1.0 14.9706 6.0593
3.1589 37.0 3219 2.8587 1.0 15.5686 6.1553
3.1516 38.0 3306 2.8529 1.0 14.5410 6.1577
3.1018 39.0 3393 2.8384 1.0 14.8202 6.2180
3.089 40.0 3480 2.8378 1.0 15.0126 6.2073
3.0987 41.0 3567 2.8307 1.0 15.2370 6.2604
3.0259 42.0 3654 2.8352 1.0 15.0191 6.2415
3.0104 43.0 3741 2.8241 1.0 15.6302 6.2377
2.9799 44.0 3828 2.8255 1.0 15.0524 6.2596
2.9478 45.0 3915 2.8068 1.0 15.0966 6.2095
2.9045 46.0 4002 2.8273 1.0 14.4436 6.3158
2.8668 47.0 4089 2.8058 1.0 14.7387 6.3670
2.8903 48.0 4176 2.8119 1.0 15.5057 6.5110
2.8865 49.0 4263 2.7904 1.0 15.5379 6.5642
2.8611 50.0 4350 2.8015 1.0 16.1127 6.5099

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/0b99f2c59efa339bc669ee1ad6404947

Base model

google/umt5-small
Finetuned
(45)
this model