94bb060ec0754e3f06a89d47aaf32471

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-ru] dataset. It achieves the following results on the evaluation set:

Loss: 2.1341
Data Size: 1.0
Epoch Runtime: 69.4062
Bleu: 7.1151

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	16.8428	0	6.3346	0.3707
No log	1	434	16.1425	0.0078	6.9476	0.4614
No log	2	868	15.2326	0.0156	7.8733	0.4247
No log	3	1302	13.7919	0.0312	8.9728	0.4687
No log	4	1736	10.7401	0.0625	10.9514	0.4977
0.6279	5	2170	7.5709	0.125	14.9074	0.7105
8.0895	6	2604	5.1063	0.25	22.7906	1.1096
5.6065	7	3038	3.9826	0.5	37.4453	3.3622
4.4646	8.0	3472	3.1978	1.0	68.6496	2.5886
3.969	9.0	3906	2.9115	1.0	68.9235	3.4788
3.7369	10.0	4340	2.7801	1.0	70.0600	3.9344
3.58	11.0	4774	2.6955	1.0	68.3549	4.2262
3.4193	12.0	5208	2.6333	1.0	68.8951	4.5022
3.3395	13.0	5642	2.5834	1.0	68.6252	4.6858
3.2362	14.0	6076	2.5427	1.0	68.5016	4.8206
3.1812	15.0	6510	2.5083	1.0	68.8764	4.9863
3.1333	16.0	6944	2.4766	1.0	69.7473	5.1495
3.0722	17.0	7378	2.4409	1.0	69.0182	5.2746
3.0225	18.0	7812	2.4249	1.0	69.4311	5.3718
2.9148	19.0	8246	2.3973	1.0	69.1320	5.5334
2.9038	20.0	8680	2.3811	1.0	69.6505	5.6109
2.8683	21.0	9114	2.3615	1.0	69.2724	5.7119
2.8021	22.0	9548	2.3485	1.0	69.1170	5.8032
2.7796	23.0	9982	2.3283	1.0	70.3405	5.8634
2.7284	24.0	10416	2.3230	1.0	70.3135	5.9333
2.7251	25.0	10850	2.3071	1.0	70.4569	5.9683
2.6767	26.0	11284	2.2925	1.0	70.1294	6.0770
2.6636	27.0	11718	2.2789	1.0	70.4020	6.1429
2.5996	28.0	12152	2.2679	1.0	70.1688	6.1850
2.5917	29.0	12586	2.2601	1.0	70.4845	6.2361
2.5747	30.0	13020	2.2475	1.0	69.9181	6.2676
2.535	31.0	13454	2.2421	1.0	69.6476	6.3524
2.5088	32.0	13888	2.2290	1.0	69.9331	6.4197
2.5258	33.0	14322	2.2253	1.0	69.0732	6.4792
2.4744	34.0	14756	2.2185	1.0	69.4931	6.5395
2.4417	35.0	15190	2.2074	1.0	69.7491	6.5594
2.4442	36.0	15624	2.2061	1.0	69.7999	6.6204
2.4052	37.0	16058	2.1952	1.0	68.9810	6.7009
2.4058	38.0	16492	2.1866	1.0	69.3795	6.7066
2.3367	39.0	16926	2.1838	1.0	69.1257	6.7676
2.317	40.0	17360	2.1768	1.0	70.2986	6.8029
2.3178	41.0	17794	2.1735	1.0	69.7982	6.8361
2.2817	42.0	18228	2.1646	1.0	71.4308	6.8576
2.2973	43.0	18662	2.1581	1.0	70.2101	6.9236
2.2728	44.0	19096	2.1546	1.0	72.9310	6.9524
2.2672	45.0	19530	2.1521	1.0	69.0454	7.0105
2.2194	46.0	19964	2.1533	1.0	73.6460	7.0428
2.1964	47.0	20398	2.1439	1.0	70.7197	7.0679
2.2149	48.0	20832	2.1327	1.0	72.6235	7.1224
2.2026	49.0	21266	2.1374	1.0	68.5771	7.1457
2.1986	50.0	21700	2.1341	1.0	69.4062	7.1151

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/94bb060ec0754e3f06a89d47aaf32471

Base model

google/umt5-small

Finetuned

(45)

this model