ece5ab9ed976091d6c72ef23b91fc802

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0637
  • Data Size: 1.0
  • Epoch Runtime: 183.9056
  • Bleu: 8.3106

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 25.8422 0 15.7176 0.0015
No log 1 1286 18.2875 0.0078 17.4798 0.0036
0.4342 2 2572 12.3803 0.0156 19.1922 0.0046
0.419 3 3858 8.5398 0.0312 22.0140 0.0087
0.4323 4 5144 4.7204 0.0625 27.2403 0.0366
4.8912 5 6430 3.5329 0.125 38.2339 0.8359
4.1474 6 7716 3.1755 0.25 59.1255 1.8778
3.7366 7 9002 2.9314 0.5 102.0389 2.6961
3.4324 8.0 10288 2.7399 1.0 182.3252 3.9365
3.2368 9.0 11574 2.6477 1.0 182.4657 4.2084
3.0952 10.0 12860 2.5687 1.0 181.8133 4.8790
2.9661 11.0 14146 2.5202 1.0 178.4277 5.2568
2.9532 12.0 15432 2.4762 1.0 181.7542 5.4523
2.8393 13.0 16718 2.4361 1.0 178.8124 5.6336
2.8024 14.0 18004 2.4072 1.0 198.5045 5.9432
2.8015 15.0 19290 2.3840 1.0 187.5317 5.9987
2.6931 16.0 20576 2.3576 1.0 180.5806 6.1513
2.675 17.0 21862 2.3377 1.0 180.5626 6.2539
2.6371 18.0 23148 2.3205 1.0 183.0677 6.4511
2.6272 19.0 24434 2.2969 1.0 183.5103 6.5279
2.5562 20.0 25720 2.2797 1.0 184.3482 6.7305
2.5346 21.0 27006 2.2661 1.0 182.3010 6.6796
2.511 22.0 28292 2.2504 1.0 182.9612 6.9751
2.5031 23.0 29578 2.2401 1.0 183.1818 7.0070
2.455 24.0 30864 2.2251 1.0 186.2117 7.0644
2.4241 25.0 32150 2.2154 1.0 186.9180 7.0505
2.4445 26.0 33436 2.2056 1.0 187.5196 7.2376
2.3923 27.0 34722 2.1965 1.0 192.6508 7.2181
2.3642 28.0 36008 2.1862 1.0 194.6447 7.3394
2.3329 29.0 37294 2.1786 1.0 188.6610 7.3962
2.3296 30.0 38580 2.1667 1.0 187.3074 7.4217
2.2623 31.0 39866 2.1605 1.0 186.3852 7.6256
2.2493 32.0 41152 2.1522 1.0 189.2922 7.5598
2.2623 33.0 42438 2.1510 1.0 187.9119 7.6353
2.2926 34.0 43724 2.1389 1.0 186.6194 7.7151
2.1919 35.0 45010 2.1375 1.0 186.7181 7.6615
2.2415 36.0 46296 2.1297 1.0 187.6718 7.7543
2.2026 37.0 47582 2.1246 1.0 187.8164 7.7978
2.1982 38.0 48868 2.1098 1.0 187.1606 7.8684
2.164 39.0 50154 2.1155 1.0 187.8143 8.0013
2.1166 40.0 51440 2.1168 1.0 188.3499 7.9425
2.1146 41.0 52726 2.1078 1.0 186.1224 8.0526
2.1115 42.0 54012 2.0968 1.0 184.8816 7.9680
2.1006 43.0 55298 2.0926 1.0 183.8539 8.0582
2.0808 44.0 56584 2.0892 1.0 190.0827 8.0700
2.0633 45.0 57870 2.0849 1.0 189.3815 8.0860
2.0851 46.0 59156 2.0788 1.0 183.5700 8.1511
2.0412 47.0 60442 2.0803 1.0 186.8171 8.0862
2.0088 48.0 61728 2.0873 1.0 191.0245 8.1643
2.0184 49.0 63014 2.0724 1.0 186.8023 8.1502
2.027 50.0 64300 2.0637 1.0 183.9056 8.3106

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ece5ab9ed976091d6c72ef23b91fc802

Base model

google/mt5-small
Finetuned
(668)
this model