038d5148187e070b975e9031342ef73a

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fr-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 4.5345
  • Data Size: 1.0
  • Epoch Runtime: 20.1158
  • Bleu: 0.7389

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 236.7255 0 1.7678 0.0204
No log 1 31 219.1461 0.0078 2.4638 0.0188
No log 2 62 201.9752 0.0156 3.1682 0.0124
No log 3 93 187.6383 0.0312 4.4594 0.0122
No log 4 124 163.7096 0.0625 6.4885 0.0215
No log 5 155 130.6493 0.125 8.8677 0.0198
No log 6 186 79.7999 0.25 11.0911 0.0040
19.2232 7 217 37.5892 0.5 15.1334 0.0052
19.2232 8.0 248 17.3204 1.0 21.9644 0.0283
27.1063 9.0 279 14.9082 1.0 20.2933 0.2114
23.9019 10.0 310 12.7926 1.0 20.2024 0.2739
23.9019 11.0 341 11.3982 1.0 19.4193 0.2784
18.8143 12.0 372 10.3658 1.0 19.5572 0.0558
16.1903 13.0 403 9.2900 1.0 20.0680 0.1722
16.1903 14.0 434 9.6809 1.0 20.0169 0.0760
14.6433 15.0 465 8.9208 1.0 19.4845 0.0727
14.6433 16.0 496 8.0822 1.0 19.7651 0.0400
13.4211 17.0 527 8.3107 1.0 19.3867 0.0532
12.4357 18.0 558 8.0146 1.0 19.9804 0.0798
12.4357 19.0 589 7.4550 1.0 20.1282 0.2350
11.5437 20.0 620 7.0633 1.0 20.3499 0.3080
10.7571 21.0 651 6.6771 1.0 19.6223 0.2713
10.7571 22.0 682 6.4946 1.0 19.9972 0.3405
10.1704 23.0 713 6.5607 1.0 19.7047 0.3599
10.1704 24.0 744 6.3228 1.0 20.5115 0.4006
9.5748 25.0 775 6.2065 1.0 20.2679 0.4567
9.1474 26.0 806 6.1813 1.0 19.4314 0.2706
9.1474 27.0 837 6.0840 1.0 19.2637 0.3095
8.7856 28.0 868 5.8688 1.0 19.8345 0.4495
8.7856 29.0 899 5.6269 1.0 19.7177 0.5039
8.3737 30.0 930 5.6042 1.0 19.7356 0.5152
8.0599 31.0 961 5.6617 1.0 19.7495 0.4316
8.0599 32.0 992 5.7713 1.0 19.9040 0.5404
7.809 33.0 1023 5.4166 1.0 20.1333 0.5372
7.4953 34.0 1054 5.3529 1.0 19.5804 0.5465
7.4953 35.0 1085 5.4388 1.0 19.7150 0.5503
7.1911 36.0 1116 5.0335 1.0 19.7163 0.5790
7.1911 37.0 1147 5.1159 1.0 19.9428 0.6223
6.9716 38.0 1178 5.0591 1.0 20.3290 0.4831
6.7211 39.0 1209 4.8556 1.0 20.0206 0.6963
6.7211 40.0 1240 5.0075 1.0 20.1885 0.6027
6.5236 41.0 1271 4.7169 1.0 21.1392 0.7565
6.3467 42.0 1302 4.7961 1.0 20.7119 0.6435
6.3467 43.0 1333 4.6043 1.0 19.7915 0.6827
6.1671 44.0 1364 4.5811 1.0 19.8226 0.7857
6.1671 45.0 1395 4.7868 1.0 19.8787 0.6117
6.0261 46.0 1426 4.5726 1.0 20.7483 0.6840
5.8656 47.0 1457 4.4508 1.0 20.1728 0.7430
5.8656 48.0 1488 4.4236 1.0 19.9059 0.8194
5.6751 49.0 1519 4.4156 1.0 20.3834 0.7308
5.5485 50.0 1550 4.5345 1.0 20.1158 0.7389

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/038d5148187e070b975e9031342ef73a

Finetuned
(38)
this model