3a9bd0d94ea11766e0113a915c2b0f91

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [de-pt] dataset. It achieves the following results on the evaluation set:

Loss: 4.5661
Data Size: 1.0
Epoch Runtime: 18.9176
Bleu: 0.8563

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	240.5146	0	1.5895	0.0052
No log	1	27	220.2246	0.0078	2.3560	0.0049
No log	2	54	201.6398	0.0156	3.4008	0.0046
No log	3	81	185.7308	0.0312	4.7109	0.0043
No log	4	108	160.8329	0.0625	6.1394	0.0044
No log	5	135	126.0964	0.125	8.2139	0.0044
No log	6	162	80.8064	0.25	9.8625	0.0078
No log	7	189	38.3810	0.5	13.0379	0.0050
21.5309	8.0	216	20.5337	1.0	19.3391	0.0070
21.5309	9.0	243	15.6982	1.0	18.6150	0.0756
29.9211	10.0	270	13.9766	1.0	19.0162	0.0394
29.9211	11.0	297	12.7335	1.0	18.1427	0.0363
20.664	12.0	324	11.3376	1.0	18.4502	0.0323
17.2601	13.0	351	11.3032	1.0	19.0537	0.0499
17.2601	14.0	378	9.5367	1.0	18.0391	0.0428
15.1396	15.0	405	9.7215	1.0	18.4896	0.0435
15.1396	16.0	432	8.9985	1.0	18.2890	0.0741
13.6693	17.0	459	8.7178	1.0	18.5686	0.0452
13.6693	18.0	486	8.0204	1.0	18.3974	0.1564
12.5207	19.0	513	7.8316	1.0	18.2894	0.1316
12.5207	20.0	540	7.6137	1.0	18.3578	0.2135
11.6421	21.0	567	7.3559	1.0	19.0919	0.2169
11.6421	22.0	594	7.2481	1.0	18.2427	0.3064
10.8325	23.0	621	7.3813	1.0	18.1103	0.3937
10.8325	24.0	648	6.6429	1.0	18.8096	0.4088
10.1643	25.0	675	6.5005	1.0	18.8014	0.5785
9.6446	26.0	702	6.7132	1.0	19.0756	0.2251
9.6446	27.0	729	6.4120	1.0	19.3266	0.4350
9.1617	28.0	756	6.2314	1.0	18.5467	0.6547
9.1617	29.0	783	5.9144	1.0	18.3461	0.5681
8.6776	30.0	810	6.0467	1.0	18.5372	0.4646
8.6776	31.0	837	5.9735	1.0	18.6964	0.3894
8.3499	32.0	864	5.8220	1.0	19.0345	0.4047
8.3499	33.0	891	5.8745	1.0	18.9467	0.5463
7.9403	34.0	918	5.5877	1.0	19.5259	0.5067
7.9403	35.0	945	5.5054	1.0	18.3917	0.5121
7.68	36.0	972	5.3874	1.0	19.1240	0.6175
7.68	37.0	999	5.6432	1.0	18.7083	0.4894
7.3719	38.0	1026	5.4467	1.0	18.7202	0.6100
7.0996	39.0	1053	5.1762	1.0	18.9310	0.7550
7.0996	40.0	1080	5.5259	1.0	18.5685	0.5951
6.8975	41.0	1107	5.3219	1.0	18.5159	0.6732
6.8975	42.0	1134	4.9517	1.0	18.7211	0.5251
6.6581	43.0	1161	4.8695	1.0	18.6375	0.7514
6.6581	44.0	1188	5.0998	1.0	18.6603	0.9022
6.4351	45.0	1215	4.8481	1.0	18.9359	0.7697
6.4351	46.0	1242	5.0892	1.0	19.1512	0.5868
6.2412	47.0	1269	4.8191	1.0	18.7786	0.7112
6.2412	48.0	1296	4.7702	1.0	18.8725	0.8425
6.0237	49.0	1323	4.5152	1.0	18.7832	0.6652
5.8837	50.0	1350	4.5661	1.0	18.9176	0.8563

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/3a9bd0d94ea11766e0113a915c2b0f91

Base model

google/long-t5-local-large

Finetuned

(38)

this model