5a559ef3e17402ca9cdf30875872b989

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [en-pl] dataset. It achieves the following results on the evaluation set:

Loss: 3.1867
Data Size: 1.0
Epoch Runtime: 36.3343
Bleu: 0.1906

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	213.8008	0	3.0134	0.0129
No log	1	70	194.3524	0.0078	3.2886	0.0141
No log	2	140	171.2908	0.0156	5.3619	0.0184
No log	3	210	152.6353	0.0312	7.0574	0.0246
No log	4	280	124.8266	0.0625	9.0357	0.0207
No log	5	350	78.2369	0.125	11.3829	0.0022
No log	6	420	35.3158	0.25	15.4992	0.0023
12.4777	7	490	15.2392	0.5	22.6609	0.0053
20.8194	8.0	560	11.0651	1.0	37.3412	0.0202
17.2076	9.0	630	9.5666	1.0	36.2949	0.0231
14.1148	10.0	700	8.3794	1.0	36.6281	0.0186
12.9572	11.0	770	8.1267	1.0	36.5062	0.0241
12.1003	12.0	840	7.2995	1.0	36.5516	0.0570
10.7095	13.0	910	6.7302	1.0	35.7256	0.0468
10.1931	14.0	980	6.4492	1.0	36.3204	0.0324
9.4549	15.0	1050	6.0786	1.0	36.5121	0.0331
9.1282	16.0	1120	5.9003	1.0	36.5204	0.0487
8.7868	17.0	1190	5.7862	1.0	36.2028	0.1106
8.2096	18.0	1260	5.3347	1.0	36.7970	0.0970
7.9992	19.0	1330	5.6001	1.0	35.8356	0.0941
7.616	20.0	1400	5.1220	1.0	36.5630	0.1012
7.3592	21.0	1470	4.9429	1.0	36.7933	0.1258
7.1536	22.0	1540	4.8756	1.0	36.6035	0.1054
6.8459	23.0	1610	4.8313	1.0	36.8208	0.1041
6.6946	24.0	1680	4.6753	1.0	36.4579	0.1186
6.4103	25.0	1750	4.6408	1.0	36.2191	0.0901
6.2567	26.0	1820	4.2928	1.0	36.2414	0.0945
6.1588	27.0	1890	4.2098	1.0	37.2924	0.1223
5.9068	28.0	1960	4.2802	1.0	36.3854	0.1060
5.8006	29.0	2030	4.2359	1.0	36.7666	0.1153
5.5965	30.0	2100	4.2038	1.0	36.3399	0.1232
5.4559	31.0	2170	3.8622	1.0	36.2647	0.1375
5.4303	32.0	2240	4.0049	1.0	35.9153	0.0945
5.2074	33.0	2310	3.8682	1.0	36.3115	0.1643
5.1162	34.0	2380	3.7963	1.0	36.1916	0.1459
4.9818	35.0	2450	3.7318	1.0	36.6585	0.1548
4.904	36.0	2520	3.7209	1.0	36.0447	0.1618
4.7944	37.0	2590	3.6300	1.0	37.2573	0.1547
4.6902	38.0	2660	3.6020	1.0	36.4977	0.1935
4.5926	39.0	2730	3.5284	1.0	36.2447	0.1710
4.4951	40.0	2800	3.4828	1.0	35.7424	0.2004
4.3935	41.0	2870	3.4711	1.0	35.9381	0.1883
4.3465	42.0	2940	3.3897	1.0	36.2940	0.1837
4.2283	43.0	3010	3.3185	1.0	36.4905	0.1724
4.2004	44.0	3080	3.3778	1.0	35.7512	0.1805
4.0716	45.0	3150	3.3623	1.0	36.7176	0.1795
4.0405	46.0	3220	3.3109	1.0	35.5879	0.1736
3.9559	47.0	3290	3.2325	1.0	36.7739	0.1744
3.8721	48.0	3360	3.1804	1.0	36.2674	0.2187
3.8177	49.0	3430	3.1772	1.0	36.5749	0.2357
3.7468	50.0	3500	3.1867	1.0	36.3343	0.1906

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/5a559ef3e17402ca9cdf30875872b989

Base model

google/long-t5-local-large

Finetuned

(38)

this model