3a9bd0d94ea11766e0113a915c2b0f91

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-pt] dataset. It achieves the following results on the evaluation set:

  • Loss: 4.5661
  • Data Size: 1.0
  • Epoch Runtime: 18.9176
  • Bleu: 0.8563

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 240.5146 0 1.5895 0.0052
No log 1 27 220.2246 0.0078 2.3560 0.0049
No log 2 54 201.6398 0.0156 3.4008 0.0046
No log 3 81 185.7308 0.0312 4.7109 0.0043
No log 4 108 160.8329 0.0625 6.1394 0.0044
No log 5 135 126.0964 0.125 8.2139 0.0044
No log 6 162 80.8064 0.25 9.8625 0.0078
No log 7 189 38.3810 0.5 13.0379 0.0050
21.5309 8.0 216 20.5337 1.0 19.3391 0.0070
21.5309 9.0 243 15.6982 1.0 18.6150 0.0756
29.9211 10.0 270 13.9766 1.0 19.0162 0.0394
29.9211 11.0 297 12.7335 1.0 18.1427 0.0363
20.664 12.0 324 11.3376 1.0 18.4502 0.0323
17.2601 13.0 351 11.3032 1.0 19.0537 0.0499
17.2601 14.0 378 9.5367 1.0 18.0391 0.0428
15.1396 15.0 405 9.7215 1.0 18.4896 0.0435
15.1396 16.0 432 8.9985 1.0 18.2890 0.0741
13.6693 17.0 459 8.7178 1.0 18.5686 0.0452
13.6693 18.0 486 8.0204 1.0 18.3974 0.1564
12.5207 19.0 513 7.8316 1.0 18.2894 0.1316
12.5207 20.0 540 7.6137 1.0 18.3578 0.2135
11.6421 21.0 567 7.3559 1.0 19.0919 0.2169
11.6421 22.0 594 7.2481 1.0 18.2427 0.3064
10.8325 23.0 621 7.3813 1.0 18.1103 0.3937
10.8325 24.0 648 6.6429 1.0 18.8096 0.4088
10.1643 25.0 675 6.5005 1.0 18.8014 0.5785
9.6446 26.0 702 6.7132 1.0 19.0756 0.2251
9.6446 27.0 729 6.4120 1.0 19.3266 0.4350
9.1617 28.0 756 6.2314 1.0 18.5467 0.6547
9.1617 29.0 783 5.9144 1.0 18.3461 0.5681
8.6776 30.0 810 6.0467 1.0 18.5372 0.4646
8.6776 31.0 837 5.9735 1.0 18.6964 0.3894
8.3499 32.0 864 5.8220 1.0 19.0345 0.4047
8.3499 33.0 891 5.8745 1.0 18.9467 0.5463
7.9403 34.0 918 5.5877 1.0 19.5259 0.5067
7.9403 35.0 945 5.5054 1.0 18.3917 0.5121
7.68 36.0 972 5.3874 1.0 19.1240 0.6175
7.68 37.0 999 5.6432 1.0 18.7083 0.4894
7.3719 38.0 1026 5.4467 1.0 18.7202 0.6100
7.0996 39.0 1053 5.1762 1.0 18.9310 0.7550
7.0996 40.0 1080 5.5259 1.0 18.5685 0.5951
6.8975 41.0 1107 5.3219 1.0 18.5159 0.6732
6.8975 42.0 1134 4.9517 1.0 18.7211 0.5251
6.6581 43.0 1161 4.8695 1.0 18.6375 0.7514
6.6581 44.0 1188 5.0998 1.0 18.6603 0.9022
6.4351 45.0 1215 4.8481 1.0 18.9359 0.7697
6.4351 46.0 1242 5.0892 1.0 19.1512 0.5868
6.2412 47.0 1269 4.8191 1.0 18.7786 0.7112
6.2412 48.0 1296 4.7702 1.0 18.8725 0.8425
6.0237 49.0 1323 4.5152 1.0 18.7832 0.6652
5.8837 50.0 1350 4.5661 1.0 18.9176 0.8563

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/3a9bd0d94ea11766e0113a915c2b0f91

Finetuned
(38)
this model