549aaa3ce14cc70304a30ca95b87074c

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-fi] dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9341
  • Data Size: 1.0
  • Epoch Runtime: 46.6996
  • Bleu: 0.9318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 211.0632 0 3.5779 0.0016
No log 1 91 188.7779 0.0078 4.1075 0.0019
No log 2 182 169.3628 0.0156 5.8967 0.0022
No log 3 273 149.8084 0.0312 7.5024 0.0022
No log 4 364 117.3038 0.0625 9.7911 0.0021
No log 5 455 67.4181 0.125 13.6960 0.0019
No log 6 546 29.9486 0.25 18.0287 0.0028
9.5787 7 637 14.3971 0.5 28.2492 0.0080
19.5655 8.0 728 10.7885 1.0 47.3220 0.0082
14.8619 9.0 819 8.7578 1.0 46.3086 0.0080
12.7107 10.0 910 8.1946 1.0 46.3137 0.0144
11.1516 11.0 1001 7.0157 1.0 46.2185 0.0318
10.7591 12.0 1092 6.7510 1.0 45.4156 0.0190
9.8214 13.0 1183 6.1254 1.0 46.6275 0.0629
9.0841 14.0 1274 6.1219 1.0 45.7993 0.0495
8.5117 15.0 1365 5.4160 1.0 46.0273 0.0841
8.0009 16.0 1456 5.3270 1.0 46.2666 0.1435
7.7391 17.0 1547 5.1370 1.0 46.2303 0.1310
7.3406 18.0 1638 4.9944 1.0 46.1756 0.1327
6.9568 19.0 1729 4.6342 1.0 45.9808 0.2060
6.6468 20.0 1820 4.5277 1.0 46.7813 0.2704
6.3802 21.0 1911 4.3945 1.0 46.3165 0.2054
6.0991 22.0 2002 4.2401 1.0 46.1848 0.1644
5.9951 23.0 2093 4.0383 1.0 45.9308 0.3349
5.7585 24.0 2184 4.0444 1.0 46.1250 0.3045
5.5825 25.0 2275 4.0388 1.0 46.0497 0.2623
5.3925 26.0 2366 3.9272 1.0 45.7610 0.4223
5.2605 27.0 2457 3.7667 1.0 46.2910 0.3692
5.1225 28.0 2548 3.7613 1.0 45.6301 0.4512
4.9741 29.0 2639 3.6562 1.0 46.4286 0.3588
4.8417 30.0 2730 3.5531 1.0 46.1431 0.5768
4.6741 31.0 2821 3.6014 1.0 46.8742 0.4160
4.6209 32.0 2912 3.4647 1.0 46.7312 0.6189
4.4791 33.0 3003 3.4088 1.0 46.6921 0.7235
4.346 34.0 3094 3.3616 1.0 45.7728 0.5786
4.311 35.0 3185 3.3856 1.0 47.5342 0.5463
4.1986 36.0 3276 3.3260 1.0 46.0606 0.6678
4.1229 37.0 3367 3.2019 1.0 46.4823 0.7473
4.0017 38.0 3458 3.2359 1.0 46.6339 0.6610
3.9321 39.0 3549 3.1598 1.0 46.3126 0.7617
3.8785 40.0 3640 3.1266 1.0 45.4446 0.8805
3.7769 41.0 3731 3.0599 1.0 46.1482 0.7153
3.7148 42.0 3822 3.0918 1.0 45.4459 0.8746
3.6051 43.0 3913 3.0205 1.0 46.5096 0.9528
3.5761 44.0 4004 3.0466 1.0 46.2574 0.8161
3.5664 45.0 4095 2.9364 1.0 46.3092 1.0224
3.4499 46.0 4186 2.9338 1.0 45.6281 0.9838
3.3732 47.0 4277 2.9169 1.0 45.4441 0.9263
3.3295 48.0 4368 2.9877 1.0 46.7569 0.7916
3.2911 49.0 4459 2.9030 1.0 46.8190 0.7812
3.2407 50.0 4550 2.9341 1.0 46.6996 0.9318

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/549aaa3ce14cc70304a30ca95b87074c

Finetuned
(38)
this model