de73f10b4bc545baba6a0ac70767ea89

This model is a fine-tuned version of google-t5/t5-small on the Helsinki-NLP/opus_books dataset. It achieves the following results on the evaluation set:

Loss: 1.3712
Data Size: 1.0
Epoch Runtime: 132.1769
Bleu: 6.0814

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	4.6458	0	11.7817	0.1613
No log	1	1000	4.2997	0.0078	13.9839	0.2020
No log	2	2000	4.0161	0.0156	14.3451	0.2473
No log	3	3000	3.8014	0.0312	16.3804	0.2429
0.1455	4	4000	3.6216	0.0625	22.6181	0.3368
3.7679	5	5000	3.4273	0.125	30.8931	0.4199
0.2175	6	6000	3.1929	0.25	43.9417	0.5916
0.291	7	7000	2.9170	0.5	77.3792	1.1880
2.8856	8.0	8000	2.6146	1.0	143.7546	1.6428
2.6855	9.0	9000	2.4309	1.0	143.5087	1.9918
2.5364	10.0	10000	2.2938	1.0	144.2288	2.2682
2.4361	11.0	11000	2.1900	1.0	149.5659	2.5739
2.3464	12.0	12000	2.1049	1.0	142.6188	2.8246
2.2631	13.0	13000	2.0329	1.0	153.2539	3.0344
2.2136	14.0	14000	1.9732	1.0	155.4876	3.2309
2.1284	15.0	15000	1.9219	1.0	153.2130	3.3932
2.0955	16.0	16000	1.8782	1.0	138.6666	3.5720
2.0283	17.0	17000	1.8365	1.0	164.6555	3.7042
2.0186	18.0	18000	1.8006	1.0	138.7472	3.8593
1.969	19.0	19000	1.7677	1.0	139.7041	3.9740
1.9141	20.0	20000	1.7369	1.0	152.2548	4.1293
1.9024	21.0	21000	1.7068	1.0	133.5899	4.2375
1.8537	22.0	22000	1.6850	1.0	132.4430	4.3756
1.8305	23.0	23000	1.6603	1.0	134.8845	4.4725
1.7969	24.0	24000	1.6410	1.0	164.0537	4.5768
1.8096	25.0	25000	1.6214	1.0	137.1758	4.6656
1.7553	26.0	26000	1.6026	1.0	162.5463	4.7777
1.7342	27.0	27000	1.5850	1.0	132.8044	4.8680
1.6983	28.0	28000	1.5707	1.0	136.7322	4.9598
1.7111	29.0	29000	1.5545	1.0	146.3394	5.0253
1.6679	30.0	30000	1.5409	1.0	145.4068	5.0856
1.6672	31.0	31000	1.5253	1.0	149.6627	5.1691
1.6531	32.0	32000	1.5168	1.0	137.6426	5.2505
1.5978	33.0	33000	1.5047	1.0	138.8876	5.2911
1.5973	34.0	34000	1.4932	1.0	145.4915	5.3297
1.5642	35.0	35000	1.4814	1.0	128.1568	5.4291
1.5677	36.0	36000	1.4746	1.0	133.6715	5.4750
1.5557	37.0	37000	1.4607	1.0	128.2057	5.5561
1.5574	38.0	38000	1.4525	1.0	124.6804	5.5801
1.5186	39.0	39000	1.4487	1.0	127.0441	5.6308
1.5115	40.0	40000	1.4393	1.0	134.0554	5.6973
1.5055	41.0	41000	1.4278	1.0	128.4823	5.7426
1.4933	42.0	42000	1.4191	1.0	130.1083	5.7839
1.4835	43.0	43000	1.4152	1.0	127.8246	5.8283
1.4572	44.0	44000	1.4068	1.0	128.7732	5.8464
1.4554	45.0	45000	1.4028	1.0	133.7048	5.9022
1.4694	46.0	46000	1.3929	1.0	136.7006	5.9434
1.4448	47.0	47000	1.3852	1.0	131.0279	5.9637
1.4448	48.0	48000	1.3817	1.0	130.6594	6.0077
1.402	49.0	49000	1.3777	1.0	133.9181	6.0371
1.4073	50.0	50000	1.3712	1.0	132.1769	6.0814

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/de73f10b4bc545baba6a0ac70767ea89

Base model

google-t5/t5-small

Finetuned

(2285)

this model