33562f6b9fce67a8c253d38dc51d877c

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fr-it] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.5661
  • Data Size: 1.0
  • Epoch Runtime: 162.8726
  • Bleu: 0.7953

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 227.9474 0 12.0214 0.0077
No log 1 367 183.7979 0.0078 13.2936 0.0065
No log 2 734 134.7391 0.0156 15.3314 0.0032
No log 3 1101 69.4337 0.0312 18.8234 0.0011
No log 4 1468 29.1306 0.0625 23.7774 0.0005
3.8408 5 1835 14.9830 0.125 33.6860 0.0155
20.8276 6 2202 11.2351 0.25 51.7824 0.1309
14.5113 7 2569 9.0706 0.5 89.2183 0.0839
10.3356 8.0 2936 6.5654 1.0 162.2343 0.0607
8.1005 9.0 3303 5.6250 1.0 160.6285 0.1078
6.9191 10.0 3670 4.6990 1.0 160.3902 0.1321
6.1299 11.0 4037 4.3165 1.0 159.7420 0.1422
5.4527 12.0 4404 3.9367 1.0 160.9828 0.1397
5.0146 13.0 4771 3.7508 1.0 160.3557 0.1540
4.7112 14.0 5138 3.5266 1.0 160.8869 0.2144
4.4185 15.0 5505 3.4124 1.0 160.9423 0.1896
4.2091 16.0 5872 3.3365 1.0 161.2574 0.2529
4.0543 17.0 6239 3.2665 1.0 161.5701 0.2257
3.908 18.0 6606 3.1801 1.0 160.9361 0.2362
3.7735 19.0 6973 3.2065 1.0 161.9550 0.1961
3.692 20.0 7340 3.0479 1.0 161.8171 0.3107
3.581 21.0 7707 3.0544 1.0 163.2240 0.2429
3.5182 22.0 8074 2.9941 1.0 163.6884 0.3816
3.435 23.0 8441 2.9701 1.0 163.1442 0.3846
3.3766 24.0 8808 2.9647 1.0 162.1650 0.2901
3.3338 25.0 9175 2.9095 1.0 161.4591 0.3830
3.2912 26.0 9542 2.9067 1.0 161.0684 0.3462
3.2124 27.0 9909 2.8573 1.0 162.8073 0.4296
3.1835 28.0 10276 2.8323 1.0 160.5233 0.3739
3.1464 29.0 10643 2.8341 1.0 161.7101 0.3739
3.0991 30.0 11010 2.8026 1.0 161.8965 0.5228
3.059 31.0 11377 2.7899 1.0 161.3309 0.4951
3.0268 32.0 11744 2.7772 1.0 162.3818 0.5396
2.9941 33.0 12111 2.7470 1.0 160.4514 0.6003
2.9486 34.0 12478 2.7180 1.0 160.8334 0.5596
2.9266 35.0 12845 2.7145 1.0 161.7326 0.6708
2.8847 36.0 13212 2.7008 1.0 162.2701 0.5416
2.8645 37.0 13579 2.6849 1.0 162.0892 0.6175
2.8496 38.0 13946 2.6890 1.0 162.7706 0.5850
2.81 39.0 14313 2.6759 1.0 162.3887 0.6315
2.7744 40.0 14680 2.6479 1.0 161.6776 0.6514
2.754 41.0 15047 2.6493 1.0 162.5414 0.6169
2.7357 42.0 15414 2.6309 1.0 163.1594 0.6945
2.686 43.0 15781 2.6149 1.0 165.9331 0.6736
2.6839 44.0 16148 2.6144 1.0 163.4045 0.6818
2.662 45.0 16515 2.6034 1.0 161.7051 0.6951
2.6385 46.0 16882 2.5922 1.0 162.0199 0.7377
2.6123 47.0 17249 2.5817 1.0 160.6540 0.6980
2.5896 48.0 17616 2.5717 1.0 161.6729 0.7286
2.5732 49.0 17983 2.5840 1.0 162.0004 0.7646
2.5499 50.0 18350 2.5661 1.0 162.8726 0.7953

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/33562f6b9fce67a8c253d38dc51d877c

Finetuned
(38)
this model