a513e980b2cf367ab8eea6f4d1b6864e

This model is a fine-tuned version of google-t5/t5-small on the Helsinki-NLP/opus_books dataset. It achieves the following results on the evaluation set:

Loss: 2.4829
Data Size: 1.0
Epoch Runtime: 12.8301
Bleu: 1.1859

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	4.9196	0	1.7581	0.0452
No log	1	86	4.9036	0.0078	2.7906	0.0456
No log	2	172	4.7683	0.0156	2.1189	0.0489
No log	3	258	4.6167	0.0312	2.2456	0.0575
No log	4	344	4.4223	0.0625	2.6220	0.0855
0.2495	5	430	4.2693	0.125	3.2169	0.0880
1.1137	6	516	4.0653	0.25	4.8851	0.1141
1.4913	7	602	3.8116	0.5	7.7852	0.1640
2.244	8.0	688	3.5580	1.0	14.7651	0.1932
3.6859	9.0	774	3.4243	1.0	13.0133	0.3036
3.5677	10.0	860	3.3250	1.0	12.7559	0.3887
3.5001	11.0	946	3.2470	1.0	13.7381	0.4316
3.4039	12.0	1032	3.1842	1.0	12.9902	0.4563
3.351	13.0	1118	3.1301	1.0	13.4499	0.5009
3.2783	14.0	1204	3.0783	1.0	13.0028	0.5354
3.2463	15.0	1290	3.0364	1.0	12.6125	0.5338
3.1979	16.0	1376	3.0006	1.0	12.5225	0.5516
3.1577	17.0	1462	2.9654	1.0	12.9970	0.5683
3.1142	18.0	1548	2.9309	1.0	12.5788	0.6017
3.075	19.0	1634	2.9053	1.0	12.3158	0.6179
3.0465	20.0	1720	2.8779	1.0	12.5898	0.6423
3.0153	21.0	1806	2.8523	1.0	11.9801	0.6640
3.0035	22.0	1892	2.8295	1.0	11.9979	0.7202
2.9598	23.0	1978	2.8069	1.0	12.1205	0.7813
2.9328	24.0	2064	2.7873	1.0	12.3217	0.7962
2.893	25.0	2150	2.7698	1.0	12.6392	0.8028
2.8921	26.0	2236	2.7514	1.0	12.7615	0.8282
2.8411	27.0	2322	2.7332	1.0	13.4124	0.8155
2.8286	28.0	2408	2.7148	1.0	12.4894	0.8278
2.8242	29.0	2494	2.7044	1.0	12.9070	0.8499
2.7917	30.0	2580	2.6899	1.0	12.9505	0.8816
2.7827	31.0	2666	2.6755	1.0	12.6046	0.8929
2.7398	32.0	2752	2.6666	1.0	11.6782	0.8981
2.7315	33.0	2838	2.6461	1.0	12.5770	0.9176
2.7199	34.0	2924	2.6410	1.0	13.1229	0.9124
2.7127	35.0	3010	2.6209	1.0	14.5130	0.9174
2.6797	36.0	3096	2.6066	1.0	14.3944	0.9410
2.6753	37.0	3182	2.6019	1.0	14.1127	0.9272
2.646	38.0	3268	2.5858	1.0	13.5025	0.9313
2.625	39.0	3354	2.5758	1.0	13.6073	0.9706
2.6172	40.0	3440	2.5639	1.0	13.0175	1.0059
2.6094	41.0	3526	2.5551	1.0	12.3109	1.0225
2.5961	42.0	3612	2.5475	1.0	12.4898	1.0111
2.5635	43.0	3698	2.5383	1.0	11.6926	1.0706
2.5724	44.0	3784	2.5275	1.0	11.6896	1.1004
2.5536	45.0	3870	2.5211	1.0	13.1903	1.1308
2.518	46.0	3956	2.5143	1.0	12.5735	1.1436
2.5136	47.0	4042	2.5037	1.0	13.7180	1.1571
2.4721	48.0	4128	2.4929	1.0	12.4402	1.1575
2.489	49.0	4214	2.4898	1.0	12.5055	1.2060
2.4664	50.0	4300	2.4829	1.0	12.8301	1.1859

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/a513e980b2cf367ab8eea6f4d1b6864e

Base model

google-t5/t5-small

Finetuned

(2257)

this model