62085db05bc5534dc41ceb5bc26be7dc

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [de-it] dataset. It achieves the following results on the evaluation set:

Loss: 1.8298
Data Size: 1.0
Epoch Runtime: 144.8149
Bleu: 6.3372

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	15.8234	0	12.1020	0.0148
No log	1	684	13.8607	0.0078	13.3868	0.0179
No log	2	1368	10.9505	0.0156	14.6276	0.0198
No log	3	2052	8.0553	0.0312	17.9701	0.0225
No log	4	2736	5.6037	0.0625	22.0481	0.0165
5.1811	5	3420	3.2013	0.125	30.1410	0.1047
3.7463	6	4104	2.6322	0.25	46.5008	1.6753
3.155	7	4788	2.3826	0.5	79.8135	2.5147
2.7999	8.0	5472	2.1990	1.0	150.8564	3.3208
2.6024	9.0	6156	2.1154	1.0	145.1827	3.8840
2.4879	10.0	6840	2.0683	1.0	144.4582	4.1739
2.3811	11.0	7524	2.0196	1.0	144.1503	4.5441
2.3104	12.0	8208	1.9883	1.0	145.1241	4.6686
2.2151	13.0	8892	1.9584	1.0	146.5486	5.0656
2.2034	14.0	9576	1.9320	1.0	143.5919	4.9911
2.1245	15.0	10260	1.9139	1.0	144.0957	5.2213
2.0575	16.0	10944	1.9004	1.0	144.0828	5.3511
2.0211	17.0	11628	1.8923	1.0	143.8787	5.4339
1.9842	18.0	12312	1.8751	1.0	143.5567	5.5857
1.9556	19.0	12996	1.8616	1.0	144.1475	5.6619
1.8982	20.0	13680	1.8570	1.0	143.8602	5.7691
1.8169	21.0	14364	1.8504	1.0	144.1139	5.9209
1.7965	22.0	15048	1.8405	1.0	143.6182	5.9149
1.7945	23.0	15732	1.8382	1.0	143.7970	5.9840
1.757	24.0	16416	1.8351	1.0	143.9385	6.0567
1.7164	25.0	17100	1.8295	1.0	143.7915	6.0723
1.6841	26.0	17784	1.8233	1.0	143.2531	6.1352
1.6742	27.0	18468	1.8271	1.0	143.7511	6.1618
1.6437	28.0	19152	1.8221	1.0	143.2447	6.1619
1.6257	29.0	19836	1.8267	1.0	144.3309	6.2036
1.5608	30.0	20520	1.8247	1.0	145.1515	6.2772
1.5455	31.0	21204	1.8231	1.0	144.4784	6.2524
1.5642	32.0	21888	1.8219	1.0	144.9063	6.3113
1.5197	33.0	22572	1.8277	1.0	143.5287	6.3368
1.4602	34.0	23256	1.8275	1.0	145.0946	6.3368
1.4472	35.0	23940	1.8330	1.0	144.3280	6.3680
1.4497	36.0	24624	1.8298	1.0	144.8149	6.3372

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/62085db05bc5534dc41ceb5bc26be7dc

Base model

google/mt5-base

Finetuned

(304)

this model