04357355773e8961693280f56a1449a0

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fr-it] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.3710
  • Data Size: 1.0
  • Epoch Runtime: 54.0672
  • Bleu: 4.7280

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 25.1154 0 5.3825 0.0016
No log 1 367 23.2656 0.0078 6.6290 0.0018
No log 2 734 20.2779 0.0156 6.4812 0.0032
No log 3 1101 16.9192 0.0312 7.7052 0.0029
No log 4 1468 12.3286 0.0625 8.8945 0.0057
0.9075 5 1835 8.1736 0.125 11.9506 0.0114
8.2208 6 2202 4.8777 0.25 18.2376 0.0163
4.8968 7 2569 3.4010 0.5 30.3690 0.4978
4.1438 8.0 2936 3.1225 1.0 55.0431 1.2713
3.8377 9.0 3303 2.9898 1.0 55.7866 1.6002
3.6606 10.0 3670 2.9150 1.0 55.9007 1.8640
3.5493 11.0 4037 2.8568 1.0 56.5953 2.1198
3.4442 12.0 4404 2.8096 1.0 55.7102 2.2761
3.403 13.0 4771 2.7733 1.0 54.4628 2.4571
3.3022 14.0 5138 2.7394 1.0 55.6655 2.5954
3.2628 15.0 5505 2.7122 1.0 54.6192 2.7575
3.1883 16.0 5872 2.6843 1.0 55.1374 2.8996
3.1465 17.0 6239 2.6687 1.0 54.9305 2.9907
3.1012 18.0 6606 2.6433 1.0 55.3830 3.0827
3.0873 19.0 6973 2.6317 1.0 56.0201 3.1600
3.0363 20.0 7340 2.6021 1.0 54.8430 3.2487
2.9852 21.0 7707 2.5877 1.0 55.4414 3.3326
2.9617 22.0 8074 2.5818 1.0 55.4208 3.4285
2.9161 23.0 8441 2.5707 1.0 54.6723 3.4842
2.8687 24.0 8808 2.5546 1.0 54.5485 3.5617
2.8744 25.0 9175 2.5502 1.0 54.6917 3.6545
2.8176 26.0 9542 2.5291 1.0 54.6620 3.7157
2.7957 27.0 9909 2.5210 1.0 54.5980 3.8414
2.7676 28.0 10276 2.5061 1.0 56.9414 3.8788
2.7649 29.0 10643 2.5044 1.0 56.3715 3.9675
2.743 30.0 11010 2.4944 1.0 55.0688 4.0330
2.697 31.0 11377 2.4803 1.0 54.9879 4.1151
2.7005 32.0 11744 2.4716 1.0 54.7681 4.1206
2.6596 33.0 12111 2.4614 1.0 54.9092 4.2020
2.6309 34.0 12478 2.4564 1.0 56.2777 4.2420
2.6054 35.0 12845 2.4494 1.0 55.9621 4.2832
2.6204 36.0 13212 2.4498 1.0 55.7327 4.3257
2.586 37.0 13579 2.4346 1.0 55.3127 4.3834
2.583 38.0 13946 2.4379 1.0 55.0265 4.4316
2.5634 39.0 14313 2.4232 1.0 55.3696 4.4535
2.5348 40.0 14680 2.4112 1.0 54.8842 4.5320
2.5346 41.0 15047 2.4179 1.0 56.2262 4.5211
2.5066 42.0 15414 2.4176 1.0 55.7012 4.6158
2.4928 43.0 15781 2.4052 1.0 55.0442 4.6144
2.4923 44.0 16148 2.4027 1.0 54.6755 4.6431
2.4631 45.0 16515 2.3979 1.0 54.2127 4.6744
2.4756 46.0 16882 2.3845 1.0 57.4652 4.6653
2.4326 47.0 17249 2.3916 1.0 54.8645 4.6903
2.4329 48.0 17616 2.3900 1.0 54.0670 4.6948
2.3847 49.0 17983 2.3755 1.0 53.8656 4.7380
2.4299 50.0 18350 2.3710 1.0 54.0672 4.7280

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
4
Safetensors
Model size
0.6B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for contemmcm/04357355773e8961693280f56a1449a0

Base model

google/mt5-small
Finetuned
(600)
this model

Evaluation results