30b6d5e2e627af094ba1e2f77a3b7f34

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-en] dataset. It achieves the following results on the evaluation set:

Loss: 2.3223
Data Size: 1.0
Epoch Runtime: 199.7975
Bleu: 8.4393

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	12.9116	0	17.6181	0.1628
No log	1	1286	12.3567	0.0078	20.1759	0.1366
0.3235	2	2572	10.7987	0.0156	20.6355	0.1596
0.3606	3	3858	7.6567	0.0312	23.9105	0.2835
0.4175	4	5144	5.7974	0.0625	30.2207	0.5039
6.1505	5	6430	4.4313	0.125	40.8792	2.7175
5.0784	6	7716	3.8856	0.25	63.2480	2.0305
4.3114	7	9002	3.3253	0.5	108.6120	6.6170
3.8944	8.0	10288	3.0586	1.0	200.4234	4.2922
3.6551	9.0	11574	2.9532	1.0	199.7025	4.7871
3.4927	10.0	12860	2.8676	1.0	199.1153	5.2797
3.3429	11.0	14146	2.8123	1.0	197.8543	5.5349
3.337	12.0	15432	2.7654	1.0	198.4591	5.8157
3.2118	13.0	16718	2.7339	1.0	196.2994	5.9710
3.149	14.0	18004	2.6913	1.0	198.0723	6.2055
3.1663	15.0	19290	2.6680	1.0	200.3472	6.3538
3.0674	16.0	20576	2.6426	1.0	199.5477	6.5225
3.0246	17.0	21862	2.6248	1.0	198.5243	6.6384
2.9689	18.0	23148	2.5998	1.0	197.3866	6.8006
2.9765	19.0	24434	2.5708	1.0	200.6827	6.8981
2.8914	20.0	25720	2.5516	1.0	199.5107	6.9797
2.8684	21.0	27006	2.5420	1.0	198.3039	7.0547
2.85	22.0	28292	2.5293	1.0	197.9585	7.1925
2.8282	23.0	29578	2.5208	1.0	200.6032	7.2603
2.7726	24.0	30864	2.5024	1.0	197.8851	7.3176
2.7357	25.0	32150	2.4980	1.0	200.1812	7.3614
2.7725	26.0	33436	2.4802	1.0	199.0763	7.4978
2.7171	27.0	34722	2.4639	1.0	200.9008	7.5388
2.6695	28.0	36008	2.4513	1.0	198.5719	7.6035
2.6495	29.0	37294	2.4449	1.0	198.0566	7.6470
2.6386	30.0	38580	2.4400	1.0	200.9912	7.6968
2.5827	31.0	39866	2.4258	1.0	195.8189	7.7532
2.5419	32.0	41152	2.4131	1.0	201.7281	7.8490
2.5598	33.0	42438	2.4158	1.0	199.4755	7.8251
2.5966	34.0	43724	2.3967	1.0	198.6902	7.8911
2.4938	35.0	45010	2.4003	1.0	200.2784	7.9039
2.5534	36.0	46296	2.3925	1.0	198.4451	7.9345
2.49	37.0	47582	2.3872	1.0	199.5405	8.0163
2.4999	38.0	48868	2.3732	1.0	199.3864	8.0583
2.4522	39.0	50154	2.3762	1.0	197.4525	8.1209
2.4022	40.0	51440	2.3729	1.0	200.3087	8.1312
2.3997	41.0	52726	2.3713	1.0	206.0565	8.1660
2.3966	42.0	54012	2.3538	1.0	198.6988	8.1874
2.374	43.0	55298	2.3520	1.0	199.3506	8.2067
2.3653	44.0	56584	2.3518	1.0	197.8360	8.2437
2.3535	45.0	57870	2.3464	1.0	204.4928	8.3057
2.3751	46.0	59156	2.3391	1.0	198.7270	8.3135
2.3081	47.0	60442	2.3350	1.0	205.2120	8.3329
2.2866	48.0	61728	2.3404	1.0	200.8320	8.4193
2.2943	49.0	63014	2.3301	1.0	199.0114	8.4100
2.3002	50.0	64300	2.3223	1.0	199.7975	8.4393

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/30b6d5e2e627af094ba1e2f77a3b7f34

Base model

google/umt5-small

Finetuned

(45)

this model