5a559ef3e17402ca9cdf30875872b989

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-pl] dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1867
  • Data Size: 1.0
  • Epoch Runtime: 36.3343
  • Bleu: 0.1906

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: constant
  • num_epochs: 50

Training results

Training Loss Epoch Step Validation Loss Data Size Epoch Runtime Bleu
No log 0 0 213.8008 0 3.0134 0.0129
No log 1 70 194.3524 0.0078 3.2886 0.0141
No log 2 140 171.2908 0.0156 5.3619 0.0184
No log 3 210 152.6353 0.0312 7.0574 0.0246
No log 4 280 124.8266 0.0625 9.0357 0.0207
No log 5 350 78.2369 0.125 11.3829 0.0022
No log 6 420 35.3158 0.25 15.4992 0.0023
12.4777 7 490 15.2392 0.5 22.6609 0.0053
20.8194 8.0 560 11.0651 1.0 37.3412 0.0202
17.2076 9.0 630 9.5666 1.0 36.2949 0.0231
14.1148 10.0 700 8.3794 1.0 36.6281 0.0186
12.9572 11.0 770 8.1267 1.0 36.5062 0.0241
12.1003 12.0 840 7.2995 1.0 36.5516 0.0570
10.7095 13.0 910 6.7302 1.0 35.7256 0.0468
10.1931 14.0 980 6.4492 1.0 36.3204 0.0324
9.4549 15.0 1050 6.0786 1.0 36.5121 0.0331
9.1282 16.0 1120 5.9003 1.0 36.5204 0.0487
8.7868 17.0 1190 5.7862 1.0 36.2028 0.1106
8.2096 18.0 1260 5.3347 1.0 36.7970 0.0970
7.9992 19.0 1330 5.6001 1.0 35.8356 0.0941
7.616 20.0 1400 5.1220 1.0 36.5630 0.1012
7.3592 21.0 1470 4.9429 1.0 36.7933 0.1258
7.1536 22.0 1540 4.8756 1.0 36.6035 0.1054
6.8459 23.0 1610 4.8313 1.0 36.8208 0.1041
6.6946 24.0 1680 4.6753 1.0 36.4579 0.1186
6.4103 25.0 1750 4.6408 1.0 36.2191 0.0901
6.2567 26.0 1820 4.2928 1.0 36.2414 0.0945
6.1588 27.0 1890 4.2098 1.0 37.2924 0.1223
5.9068 28.0 1960 4.2802 1.0 36.3854 0.1060
5.8006 29.0 2030 4.2359 1.0 36.7666 0.1153
5.5965 30.0 2100 4.2038 1.0 36.3399 0.1232
5.4559 31.0 2170 3.8622 1.0 36.2647 0.1375
5.4303 32.0 2240 4.0049 1.0 35.9153 0.0945
5.2074 33.0 2310 3.8682 1.0 36.3115 0.1643
5.1162 34.0 2380 3.7963 1.0 36.1916 0.1459
4.9818 35.0 2450 3.7318 1.0 36.6585 0.1548
4.904 36.0 2520 3.7209 1.0 36.0447 0.1618
4.7944 37.0 2590 3.6300 1.0 37.2573 0.1547
4.6902 38.0 2660 3.6020 1.0 36.4977 0.1935
4.5926 39.0 2730 3.5284 1.0 36.2447 0.1710
4.4951 40.0 2800 3.4828 1.0 35.7424 0.2004
4.3935 41.0 2870 3.4711 1.0 35.9381 0.1883
4.3465 42.0 2940 3.3897 1.0 36.2940 0.1837
4.2283 43.0 3010 3.3185 1.0 36.4905 0.1724
4.2004 44.0 3080 3.3778 1.0 35.7512 0.1805
4.0716 45.0 3150 3.3623 1.0 36.7176 0.1795
4.0405 46.0 3220 3.3109 1.0 35.5879 0.1736
3.9559 47.0 3290 3.2325 1.0 36.7739 0.1744
3.8721 48.0 3360 3.1804 1.0 36.2674 0.2187
3.8177 49.0 3430 3.1772 1.0 36.5749 0.2357
3.7468 50.0 3500 3.1867 1.0 36.3343 0.1906

Framework versions

  • Transformers 4.57.0
  • Pytorch 2.8.0+cu128
  • Datasets 4.2.0
  • Tokenizers 0.22.1
Downloads last month
1
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/5a559ef3e17402ca9cdf30875872b989

Finetuned
(38)
this model