93dac85b8013dcf7e10d3d38ebb59e2a

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-it] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2872
  • Data Size: 1.0
  • Epoch Runtime: 349.0586
  • Bleu: 1.8235

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 235.5034 0 25.3887 0.0103
No log 1 808 155.7481 0.0078 28.3509 0.0090
No log 2 1616 85.2883 0.0156 31.4181 0.0014
No log 3 2424 28.7324 0.0312 37.6448 0.0034
2.4881 4 3232 15.2552 0.0625 48.8077 0.1189
21.1557 5 4040 12.0810 0.125 69.8835 0.2116
14.5938 6 4848 8.4854 0.25 110.5674 0.0080
9.8749 7 5656 6.7510 0.5 188.5840 0.0297
6.8205 8.0 6464 4.6572 1.0 347.2873 0.0850
5.4199 9.0 7272 3.9670 1.0 348.2967 0.1064
4.6668 10.0 8080 3.4696 1.0 345.4658 0.1911
4.2211 11.0 8888 3.2600 1.0 349.3905 0.2933
3.8767 12.0 9696 3.1371 1.0 349.5232 0.3225
3.6482 13.0 10504 3.0063 1.0 348.4620 0.4492
3.4935 14.0 11312 2.9206 1.0 345.3365 0.4697
3.3373 15.0 12120 2.8382 1.0 348.1898 0.6622
3.2697 16.0 12928 2.7994 1.0 348.8646 0.4981
3.1322 17.0 13736 2.7335 1.0 345.8731 0.6365
3.0663 18.0 14544 2.6849 1.0 349.8127 0.7391
2.9875 19.0 15352 2.6387 1.0 349.2950 0.7679
2.9183 20.0 16160 2.6118 1.0 351.4093 0.8138
2.8317 21.0 16968 2.5693 1.0 347.5031 0.9289
2.812 22.0 17776 2.5282 1.0 346.9961 0.9154
2.7867 23.0 18584 2.5255 1.0 349.6256 0.9863
2.683 24.0 19392 2.4789 1.0 351.8222 1.0570
2.6524 25.0 20200 2.4598 1.0 346.3992 1.0322
2.5791 26.0 21008 2.4307 1.0 347.3062 1.1019
2.5693 27.0 21816 2.4167 1.0 348.5370 1.1795
2.4995 28.0 22624 2.4055 1.0 347.6225 1.1704
2.4903 29.0 23432 2.3919 1.0 349.8536 1.2286
2.4173 30.0 24240 2.3657 1.0 351.1305 1.2316
2.4013 31.0 25048 2.3593 1.0 350.8515 1.3380
2.3653 32.0 25856 2.3391 1.0 347.7500 1.3574
2.3307 33.0 26664 2.3380 1.0 347.4459 1.3520
2.275 34.0 27472 2.3258 1.0 348.2127 1.4165
2.2421 35.0 28280 2.3144 1.0 350.9273 1.4770
2.2205 36.0 29088 2.2968 1.0 348.9591 1.5379
2.1887 37.0 29896 2.2962 1.0 347.4848 1.5300
2.1615 38.0 30704 2.2939 1.0 347.7210 1.5632
2.0893 39.0 31512 2.2868 1.0 346.9154 1.5886
2.0711 40.0 32320 2.2774 1.0 345.7959 1.5778
2.0476 41.0 33128 2.2797 1.0 350.1582 1.6413
2.0284 42.0 33936 2.2766 1.0 347.6871 1.6602
1.9713 43.0 34744 2.2738 1.0 345.1102 1.7006
1.9567 44.0 35552 2.2731 1.0 349.2087 1.7527
1.9156 45.0 36360 2.2725 1.0 347.9890 1.7388
1.9013 46.0 37168 2.2802 1.0 347.8241 1.7679
1.8791 47.0 37976 2.2744 1.0 347.6741 1.8353
1.8446 48.0 38784 2.2814 1.0 349.0220 1.8127
1.801 49.0 39592 2.2872 1.0 349.0586 1.8235

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/93dac85b8013dcf7e10d3d38ebb59e2a

Finetuned
(38)
this model