6d24a5eb768a198ae73b54fa54072735

This model is a fine-tuned version of google/long-t5-local-large on the Helsinki-NLP/opus_books [es-nl] dataset. It achieves the following results on the evaluation set:

Loss: 1.9430
Data Size: 1.0
Epoch Runtime: 349.3459
Bleu: 1.4024

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	218.2100	0	25.0722	0.0029
No log	1	806	135.8632	0.0078	28.4186	0.0026
No log	2	1612	72.6377	0.0156	32.3281	0.0012
No log	3	2418	25.0309	0.0312	38.3946	0.0007
2.2782	4	3224	14.2780	0.0625	48.6758	0.0033
18.3293	5	4030	10.1719	0.125	69.3327	0.0069
12.7345	6	4836	7.7438	0.25	108.8321	0.0237
8.7676	7	5642	5.8010	0.5	189.2463	0.0223
5.9844	8.0	6448	4.0580	1.0	348.8722	0.0502
4.6451	9.0	7254	3.4634	1.0	352.2134	0.0946
4.0376	10.0	8060	3.0997	1.0	353.1616	0.1436
3.6333	11.0	8866	2.8809	1.0	350.2432	0.1828
3.4008	12.0	9672	2.7696	1.0	351.4368	0.1934
3.2335	13.0	10478	2.6920	1.0	350.9128	0.2562
3.0756	14.0	11284	2.6157	1.0	352.3415	0.2855
2.9579	15.0	12090	2.5355	1.0	351.0503	0.3491
2.8798	16.0	12896	2.5249	1.0	351.3586	0.3560
2.7897	17.0	13702	2.4580	1.0	350.1344	0.4317
2.7262	18.0	14508	2.4061	1.0	351.1663	0.4623
2.6962	19.0	15314	2.3797	1.0	349.7022	0.5310
2.6126	20.0	16120	2.3499	1.0	351.0222	0.5647
2.5806	21.0	16926	2.3145	1.0	351.7137	0.5577
2.5476	22.0	17732	2.2874	1.0	351.7715	0.5803
2.5206	23.0	18538	2.2583	1.0	353.4597	0.6327
2.4558	24.0	19344	2.2414	1.0	353.0918	0.6507
2.4214	25.0	20150	2.2215	1.0	352.0976	0.6475
2.3758	26.0	20956	2.1898	1.0	348.2238	0.7510
2.3366	27.0	21762	2.1735	1.0	349.3065	0.7297
2.3167	28.0	22568	2.1639	1.0	349.5059	0.7622
2.2714	29.0	23374	2.1394	1.0	351.9686	0.7805
2.2556	30.0	24180	2.1266	1.0	352.5284	0.8544
2.2305	31.0	24986	2.1105	1.0	355.9961	0.8285
2.1928	32.0	25792	2.1064	1.0	355.1299	0.8994
2.1717	33.0	26598	2.0856	1.0	351.6526	0.9018
2.1266	34.0	27404	2.0632	1.0	354.7243	0.9453
2.1371	35.0	28210	2.0498	1.0	352.0551	0.9594
2.0909	36.0	29016	2.0523	1.0	350.2846	0.9577
2.0623	37.0	29822	2.0360	1.0	351.6120	1.0107
2.0601	38.0	30628	2.0212	1.0	352.5911	1.0315
2.0236	39.0	31434	2.0179	1.0	349.9444	1.0745
1.9903	40.0	32240	1.9996	1.0	351.4016	1.1236
1.9811	41.0	33046	1.9928	1.0	351.0448	1.1127
1.9425	42.0	33852	1.9838	1.0	351.3833	1.1870
1.92	43.0	34658	1.9741	1.0	349.1373	1.2144
1.8854	44.0	35464	1.9738	1.0	352.3969	1.2406
1.8909	45.0	36270	1.9711	1.0	353.0046	1.2970
1.8646	46.0	37076	1.9539	1.0	349.8124	1.3181
1.8158	47.0	37882	1.9544	1.0	349.7610	1.3287
1.8049	48.0	38688	1.9410	1.0	346.4040	1.3900
1.7911	49.0	39494	1.9471	1.0	347.3523	1.4239
1.7596	50.0	40300	1.9430	1.0	349.3459	1.4024

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.8B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/6d24a5eb768a198ae73b54fa54072735

Base model

google/long-t5-local-large

Finetuned

(38)

this model