ece5ab9ed976091d6c72ef23b91fc802

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

Loss: 2.0637
Data Size: 1.0
Epoch Runtime: 183.9056
Bleu: 8.3106

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	25.8422	0	15.7176	0.0015
No log	1	1286	18.2875	0.0078	17.4798	0.0036
0.4342	2	2572	12.3803	0.0156	19.1922	0.0046
0.419	3	3858	8.5398	0.0312	22.0140	0.0087
0.4323	4	5144	4.7204	0.0625	27.2403	0.0366
4.8912	5	6430	3.5329	0.125	38.2339	0.8359
4.1474	6	7716	3.1755	0.25	59.1255	1.8778
3.7366	7	9002	2.9314	0.5	102.0389	2.6961
3.4324	8.0	10288	2.7399	1.0	182.3252	3.9365
3.2368	9.0	11574	2.6477	1.0	182.4657	4.2084
3.0952	10.0	12860	2.5687	1.0	181.8133	4.8790
2.9661	11.0	14146	2.5202	1.0	178.4277	5.2568
2.9532	12.0	15432	2.4762	1.0	181.7542	5.4523
2.8393	13.0	16718	2.4361	1.0	178.8124	5.6336
2.8024	14.0	18004	2.4072	1.0	198.5045	5.9432
2.8015	15.0	19290	2.3840	1.0	187.5317	5.9987
2.6931	16.0	20576	2.3576	1.0	180.5806	6.1513
2.675	17.0	21862	2.3377	1.0	180.5626	6.2539
2.6371	18.0	23148	2.3205	1.0	183.0677	6.4511
2.6272	19.0	24434	2.2969	1.0	183.5103	6.5279
2.5562	20.0	25720	2.2797	1.0	184.3482	6.7305
2.5346	21.0	27006	2.2661	1.0	182.3010	6.6796
2.511	22.0	28292	2.2504	1.0	182.9612	6.9751
2.5031	23.0	29578	2.2401	1.0	183.1818	7.0070
2.455	24.0	30864	2.2251	1.0	186.2117	7.0644
2.4241	25.0	32150	2.2154	1.0	186.9180	7.0505
2.4445	26.0	33436	2.2056	1.0	187.5196	7.2376
2.3923	27.0	34722	2.1965	1.0	192.6508	7.2181
2.3642	28.0	36008	2.1862	1.0	194.6447	7.3394
2.3329	29.0	37294	2.1786	1.0	188.6610	7.3962
2.3296	30.0	38580	2.1667	1.0	187.3074	7.4217
2.2623	31.0	39866	2.1605	1.0	186.3852	7.6256
2.2493	32.0	41152	2.1522	1.0	189.2922	7.5598
2.2623	33.0	42438	2.1510	1.0	187.9119	7.6353
2.2926	34.0	43724	2.1389	1.0	186.6194	7.7151
2.1919	35.0	45010	2.1375	1.0	186.7181	7.6615
2.2415	36.0	46296	2.1297	1.0	187.6718	7.7543
2.2026	37.0	47582	2.1246	1.0	187.8164	7.7978
2.1982	38.0	48868	2.1098	1.0	187.1606	7.8684
2.164	39.0	50154	2.1155	1.0	187.8143	8.0013
2.1166	40.0	51440	2.1168	1.0	188.3499	7.9425
2.1146	41.0	52726	2.1078	1.0	186.1224	8.0526
2.1115	42.0	54012	2.0968	1.0	184.8816	7.9680
2.1006	43.0	55298	2.0926	1.0	183.8539	8.0582
2.0808	44.0	56584	2.0892	1.0	190.0827	8.0700
2.0633	45.0	57870	2.0849	1.0	189.3815	8.0860
2.0851	46.0	59156	2.0788	1.0	183.5700	8.1511
2.0412	47.0	60442	2.0803	1.0	186.8171	8.0862
2.0088	48.0	61728	2.0873	1.0	191.0245	8.1643
2.0184	49.0	63014	2.0724	1.0	186.8023	8.1502
2.027	50.0	64300	2.0637	1.0	183.9056	8.3106

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ece5ab9ed976091d6c72ef23b91fc802

Base model

google/mt5-small

Finetuned

(668)

this model