862ec78bac8109fd96f6d604da3f9089

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-sv] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0722
  • Data Size: 1.0
  • Epoch Runtime: 39.7266
  • Bleu: 0.5534

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 223.9917 0 3.2371 0.0024
No log 1 77 202.5292 0.0078 3.8995 0.0043
No log 2 154 178.5526 0.0156 5.0033 0.0026
No log 3 231 154.7477 0.0312 6.6561 0.0027
No log 4 308 122.7218 0.0625 8.8027 0.0024
No log 5 385 71.8897 0.125 12.3707 0.0040
11.0887 6 462 29.8310 0.25 16.7527 0.0041
15.6486 7 539 15.4986 0.5 25.3639 0.0030
20.3585 8.0 616 11.0047 1.0 40.4686 0.0045
17.1318 9.0 693 9.3429 1.0 40.0204 0.0144
13.9398 10.0 770 8.7452 1.0 40.0600 0.0107
12.9449 11.0 847 7.9409 1.0 41.2014 0.0120
11.4343 12.0 924 7.5845 1.0 40.5638 0.0138
10.4966 13.0 1001 6.5332 1.0 40.3904 0.0194
10.1049 14.0 1078 6.4489 1.0 40.2994 0.0356
9.3497 15.0 1155 6.2815 1.0 39.4442 0.0545
9.0683 16.0 1232 5.8145 1.0 39.6385 0.0658
8.4766 17.0 1309 5.8225 1.0 39.6333 0.0533
8.2237 18.0 1386 5.3779 1.0 39.1000 0.1326
7.806 19.0 1463 5.2641 1.0 39.4449 0.1065
7.5591 20.0 1540 5.1220 1.0 39.8956 0.0928
7.1678 21.0 1617 5.0954 1.0 38.9583 0.0739
7.0647 22.0 1694 4.8270 1.0 39.4165 0.0900
6.6655 23.0 1771 4.7228 1.0 38.8910 0.1248
6.4936 24.0 1848 4.5843 1.0 39.0794 0.1653
6.2763 25.0 1925 4.6011 1.0 38.8424 0.2135
6.0459 26.0 2002 4.3207 1.0 39.4661 0.2475
5.8962 27.0 2079 4.1732 1.0 38.6549 0.2606
5.6867 28.0 2156 4.2663 1.0 39.7735 0.2386
5.5789 29.0 2233 4.2129 1.0 38.3106 0.1862
5.3694 30.0 2310 4.0500 1.0 39.1182 0.3258
5.2994 31.0 2387 3.8977 1.0 39.5137 0.2466
5.097 32.0 2464 3.7638 1.0 39.5681 0.3649
5.0502 33.0 2541 3.6888 1.0 38.6806 0.4253
4.9306 34.0 2618 3.8535 1.0 39.7200 0.2963
4.7802 35.0 2695 3.6710 1.0 38.4306 0.4275
4.6324 36.0 2772 3.6098 1.0 38.7097 0.3102
4.6086 37.0 2849 3.6183 1.0 38.4791 0.3835
4.4612 38.0 2926 3.4896 1.0 39.1358 0.4617
4.3899 39.0 3003 3.4306 1.0 39.7051 0.4606
4.3095 40.0 3080 3.4122 1.0 38.4065 0.4714
4.1906 41.0 3157 3.3982 1.0 39.5659 0.3783
4.1157 42.0 3234 3.3071 1.0 39.3906 0.5062
4.0427 43.0 3311 3.3112 1.0 39.0938 0.4346
3.9858 44.0 3388 3.3214 1.0 39.8291 0.4165
3.894 45.0 3465 3.3178 1.0 39.3775 0.4522
3.8483 46.0 3542 3.2357 1.0 40.2369 0.4297
3.7456 47.0 3619 3.1341 1.0 39.2569 0.5572
3.7164 48.0 3696 3.0997 1.0 39.4336 0.6396
3.6318 49.0 3773 3.0787 1.0 40.4189 0.5809
3.5901 50.0 3850 3.0722 1.0 39.7266 0.5534

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/862ec78bac8109fd96f6d604da3f9089

Finetuned
(38)
this model