78c43c296753607a367bd75d67a2b57d

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [fi-pl] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.0414
  • Data Size: 1.0
  • Epoch Runtime: 36.2348
  • Bleu: 0.0885

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 209.2599 0 2.9343 0.0022
No log 1 70 187.7119 0.0078 3.6677 0.0021
No log 2 140 160.4909 0.0156 5.1776 0.0035
No log 3 210 135.7296 0.0312 6.6340 0.0033
No log 4 280 104.5299 0.0625 8.9982 0.0028
No log 5 350 57.1903 0.125 12.2910 0.0035
No log 6 420 28.1823 0.25 16.1746 0.0013
10.6872 7 490 14.6561 0.5 22.8352 0.0036
20.0982 8.0 560 10.7794 1.0 37.2661 0.0048
16.7674 9.0 630 8.8928 1.0 36.4249 0.0079
13.3888 10.0 700 8.2467 1.0 36.7575 0.0135
12.499 11.0 770 7.3715 1.0 36.2484 0.0158
11.725 12.0 840 6.9170 1.0 37.2235 0.0162
10.5242 13.0 910 6.5168 1.0 36.5129 0.0192
10.0615 14.0 980 6.2463 1.0 36.6950 0.0197
9.3333 15.0 1050 6.1689 1.0 36.1341 0.0191
8.96 16.0 1120 5.8733 1.0 36.9673 0.0211
8.6238 17.0 1190 5.5877 1.0 36.3308 0.0226
8.1025 18.0 1260 5.2607 1.0 36.7496 0.0207
7.8044 19.0 1330 5.3178 1.0 36.2681 0.0187
7.4109 20.0 1400 4.9554 1.0 36.5586 0.0334
7.2297 21.0 1470 4.7693 1.0 36.2652 0.0233
7.0329 22.0 1540 4.9162 1.0 36.9754 0.0461
6.6534 23.0 1610 4.3485 1.0 36.5360 0.0201
6.4951 24.0 1680 4.3899 1.0 36.1867 0.0439
6.1811 25.0 1750 4.3861 1.0 36.2829 0.0531
6.0734 26.0 1820 4.2726 1.0 36.4671 0.0478
5.9986 27.0 1890 4.1590 1.0 36.6922 0.0432
5.7188 28.0 1960 4.1682 1.0 36.0706 0.0406
5.5975 29.0 2030 3.9898 1.0 36.4282 0.0456
5.3677 30.0 2100 3.9291 1.0 35.9161 0.0461
5.3125 31.0 2170 3.7723 1.0 36.5840 0.0542
5.2008 32.0 2240 3.6980 1.0 36.7902 0.0614
4.999 33.0 2310 3.6551 1.0 36.8089 0.0701
4.9079 34.0 2380 3.5539 1.0 36.7155 0.0535
4.7541 35.0 2450 3.4654 1.0 36.7493 0.1156
4.6667 36.0 2520 3.4963 1.0 36.4739 0.0356
4.5992 37.0 2590 3.3897 1.0 36.0276 0.0715
4.4739 38.0 2660 3.3127 1.0 36.3668 0.0923
4.3799 39.0 2730 3.2270 1.0 37.5475 0.1129
4.2726 40.0 2800 3.2151 1.0 36.9524 0.1486
4.1815 41.0 2870 3.2228 1.0 36.4025 0.0813
4.1766 42.0 2940 3.2554 1.0 36.9824 0.0646
4.0409 43.0 3010 3.1722 1.0 36.4834 0.0750
3.9996 44.0 3080 3.1450 1.0 36.3839 0.0706
3.9136 45.0 3150 3.0476 1.0 36.4526 0.0890
3.8524 46.0 3220 3.0807 1.0 36.4076 0.0702
3.7894 47.0 3290 2.9933 1.0 37.0932 0.0943
3.6966 48.0 3360 2.9177 1.0 37.4004 0.1357
3.668 49.0 3430 2.9728 1.0 36.4877 0.0849
3.6084 50.0 3500 3.0414 1.0 36.2348 0.0885

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/78c43c296753607a367bd75d67a2b57d

Finetuned
(38)
this model