f5b7419600bbc19f534df6daa8c13887

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [it-nl] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1133
  • Data Size: 1.0
  • Epoch Runtime: 9.9173
  • Bleu: 1.4951

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 23.3744 0 1.6147 0.0010
No log 1 58 23.1855 0.0078 1.6773 0.0015
No log 2 116 22.8682 0.0156 1.7411 0.0015
No log 3 174 22.4285 0.0312 2.1981 0.0013
No log 4 232 21.2909 0.0625 2.5757 0.0015
No log 5 290 18.6704 0.125 2.9737 0.0014
1.9421 6 348 15.5708 0.25 3.9325 0.0021
2.5078 7 406 10.8684 0.5 5.6988 0.0028
8.8154 8.0 464 6.1040 1.0 9.7775 0.0222
7.3386 9.0 522 4.9965 1.0 9.5529 0.0366
5.9974 10.0 580 4.2617 1.0 9.8360 0.0584
5.2919 11.0 638 3.9682 1.0 9.5438 0.0889
4.9133 12.0 696 3.8141 1.0 9.8508 0.4793
4.5573 13.0 754 3.7247 1.0 9.5140 0.5970
4.4588 14.0 812 3.6695 1.0 9.4895 0.6336
4.3457 15.0 870 3.6156 1.0 9.5015 0.6482
4.2786 16.0 928 3.5704 1.0 9.8842 0.7365
4.2259 17.0 986 3.5328 1.0 10.2678 0.8037
4.1609 18.0 1044 3.5001 1.0 10.2281 0.8533
4.078 19.0 1102 3.4749 1.0 10.2322 0.8296
4.0242 20.0 1160 3.4503 1.0 10.8015 0.9442
3.9846 21.0 1218 3.4242 1.0 9.7304 0.9094
3.9568 22.0 1276 3.4005 1.0 9.6782 0.9428
3.923 23.0 1334 3.3833 1.0 9.8428 0.9771
3.8988 24.0 1392 3.3621 1.0 9.8737 1.0161
3.8403 25.0 1450 3.3472 1.0 9.9675 1.0115
3.8154 26.0 1508 3.3342 1.0 10.4292 1.0457
3.7767 27.0 1566 3.3146 1.0 10.8942 1.0446
3.7655 28.0 1624 3.3019 1.0 10.8581 1.0148
3.7476 29.0 1682 3.2869 1.0 10.8080 1.1099
3.7219 30.0 1740 3.2732 1.0 9.6816 1.1335
3.6954 31.0 1798 3.2618 1.0 10.1066 1.1724
3.6559 32.0 1856 3.2502 1.0 10.3752 1.1747
3.6351 33.0 1914 3.2424 1.0 10.3499 1.1920
3.622 34.0 1972 3.2301 1.0 10.6199 1.2463
3.5962 35.0 2030 3.2204 1.0 10.7700 1.2873
3.5807 36.0 2088 3.2108 1.0 10.9988 1.2177
3.5699 37.0 2146 3.2041 1.0 11.6971 1.2554
3.5291 38.0 2204 3.1918 1.0 11.6496 1.2671
3.5063 39.0 2262 3.1890 1.0 11.9242 1.2835
3.5058 40.0 2320 3.1768 1.0 9.8241 1.3632
3.4813 41.0 2378 3.1714 1.0 10.3065 1.3329
3.4609 42.0 2436 3.1621 1.0 10.6125 1.3574
3.454 43.0 2494 3.1562 1.0 10.9934 1.3835
3.4304 44.0 2552 3.1523 1.0 11.1347 1.3835
3.4036 45.0 2610 3.1419 1.0 11.0001 1.4078
3.4139 46.0 2668 3.1367 1.0 11.2890 1.5005
3.3835 47.0 2726 3.1305 1.0 11.2280 1.4660
3.3731 48.0 2784 3.1234 1.0 12.6309 1.4660
3.3645 49.0 2842 3.1187 1.0 11.7803 1.5160
3.3363 50.0 2900 3.1133 1.0 9.9173 1.4951

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/f5b7419600bbc19f534df6daa8c13887

Base model

google/mt5-small
Finetuned
(653)
this model