10675fdf2ffcc008846278e8d48333a2

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fi-pl] dataset. It achieves the following results on the evaluation set:

Loss: 3.4986
Data Size: 1.0
Epoch Runtime: 13.5887
Bleu: 1.2326

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	17.9148	0	1.7136	0.2824
No log	1	70	17.9378	0.0078	2.0078	0.2747
No log	2	140	17.7525	0.0156	2.3068	0.2499
No log	3	210	17.2373	0.0312	2.5152	0.3278
No log	4	280	16.6606	0.0625	3.0234	0.2880
No log	5	350	15.7679	0.125	4.0189	0.3841
No log	6	420	13.5984	0.25	5.4762	0.3811
2.6579	7	490	11.1682	0.5	7.9828	0.4340
11.6973	8.0	560	7.9426	1.0	14.0710	0.3751
9.4107	9.0	630	5.9708	1.0	13.4130	0.3371
7.1475	10.0	700	5.1801	1.0	11.9684	0.5429
6.6668	11.0	770	4.9393	1.0	13.1774	0.4793
6.3727	12.0	840	4.7408	1.0	12.4711	0.6074
5.8705	13.0	910	4.5561	1.0	12.3122	0.6628
5.6557	14.0	980	4.4244	1.0	12.2256	0.8592
5.3732	15.0	1050	4.2971	1.0	12.4446	0.7637
5.2184	16.0	1120	4.1745	1.0	12.6563	0.4102
5.0985	17.0	1190	4.0587	1.0	13.1597	0.4544
4.9163	18.0	1260	3.9604	1.0	12.1062	0.6604
4.8539	19.0	1330	3.9035	1.0	12.4323	0.6412
4.7189	20.0	1400	3.8508	1.0	12.7998	0.7608
4.6147	21.0	1470	3.8001	1.0	13.0418	0.7460
4.6074	22.0	1540	3.7722	1.0	13.3161	0.7507
4.4924	23.0	1610	3.7449	1.0	13.2591	0.7690
4.4508	24.0	1680	3.7120	1.0	13.6368	0.7716
4.3653	25.0	1750	3.6865	1.0	14.0664	0.8044
4.3382	26.0	1820	3.6724	1.0	12.4489	0.7833
4.3181	27.0	1890	3.6544	1.0	12.5460	0.7901
4.2189	28.0	1960	3.6379	1.0	12.9568	0.8633
4.1794	29.0	2030	3.6218	1.0	12.9708	0.9159
4.1435	30.0	2100	3.6093	1.0	13.7295	0.9259
4.1034	31.0	2170	3.6055	1.0	13.3488	0.8953
4.1007	32.0	2240	3.5849	1.0	13.7861	0.9364
4.0429	33.0	2310	3.5790	1.0	14.1256	0.9593
3.9885	34.0	2380	3.5701	1.0	14.4190	0.9911
3.9478	35.0	2450	3.5599	1.0	12.3155	1.0229
3.9475	36.0	2520	3.5523	1.0	12.7013	1.0155
3.9098	37.0	2590	3.5454	1.0	12.9939	1.0507
3.8731	38.0	2660	3.5432	1.0	13.0388	1.0253
3.8486	39.0	2730	3.5382	1.0	13.2780	1.0464
3.8009	40.0	2800	3.5336	1.0	13.4696	1.0871
3.7859	41.0	2870	3.5298	1.0	13.9548	1.1019
3.8035	42.0	2940	3.5214	1.0	13.4818	1.1478
3.7408	43.0	3010	3.5181	1.0	14.0851	1.1370
3.7268	44.0	3080	3.5147	1.0	12.4496	1.1332
3.6811	45.0	3150	3.5205	1.0	12.6512	1.1282
3.6578	46.0	3220	3.5103	1.0	12.9433	1.1536
3.6635	47.0	3290	3.5073	1.0	13.0345	1.1905
3.6293	48.0	3360	3.5117	1.0	13.1041	1.1758
3.6062	49.0	3430	3.5035	1.0	12.8431	1.1830
3.6127	50.0	3500	3.4986	1.0	13.5887	1.2326

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/10675fdf2ffcc008846278e8d48333a2

Base model

google/umt5-small

Finetuned

(45)

this model