a93eba97dfea3ff692f3ee62a5a4873a

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [it-ru] dataset. It achieves the following results on the evaluation set:

Loss: 1.8385
Data Size: 1.0
Epoch Runtime: 71.3281
Bleu: 9.6363

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	17.1086	0	6.4889	0.2764
No log	1	447	16.7062	0.0078	8.1216	0.2387
0.3244	2	894	15.9956	0.0156	7.7222	0.2189
0.4401	3	1341	14.9017	0.0312	8.9240	0.2174
0.6801	4	1788	11.0771	0.0625	11.1158	0.2432
1.0294	5	2235	7.5604	0.125	15.3990	0.6508
8.6763	6	2682	4.8568	0.25	23.1476	1.4250
5.4588	7	3129	3.7579	0.5	38.1228	4.0730
4.2962	8.0	3576	2.9575	1.0	70.9136	3.4979
3.7957	9.0	4023	2.6555	1.0	69.6879	4.5745
3.5378	10.0	4470	2.5277	1.0	70.0899	5.2329
3.3401	11.0	4917	2.4344	1.0	70.4763	5.6474
3.2244	12.0	5364	2.3658	1.0	70.6551	5.9999
3.0924	13.0	5811	2.3067	1.0	70.2377	6.2892
2.9826	14.0	6258	2.2672	1.0	71.6834	6.5737
2.9535	15.0	6705	2.2242	1.0	70.8144	6.7462
2.8689	16.0	7152	2.1933	1.0	72.8303	6.9514
2.8202	17.0	7599	2.1639	1.0	70.4897	7.1483
2.7643	18.0	8046	2.1387	1.0	71.7494	7.2870
2.7199	19.0	8493	2.1163	1.0	70.9859	7.3945
2.6807	20.0	8940	2.0955	1.0	73.8687	7.5529
2.6525	21.0	9387	2.0845	1.0	71.6488	7.6848
2.5591	22.0	9834	2.0597	1.0	71.9993	7.8096
2.5191	23.0	10281	2.0422	1.0	70.3985	7.9218
2.4782	24.0	10728	2.0310	1.0	71.3682	8.0161
2.4537	25.0	11175	2.0124	1.0	70.6184	8.1291
2.4013	26.0	11622	2.0065	1.0	71.0686	8.1886
2.444	27.0	12069	1.9869	1.0	70.3656	8.3094
2.3569	28.0	12516	1.9811	1.0	71.5175	8.3481
2.303	29.0	12963	1.9685	1.0	70.8783	8.4590
2.2919	30.0	13410	1.9608	1.0	72.1665	8.4929
2.283	31.0	13857	1.9447	1.0	70.0651	8.6057
2.2257	32.0	14304	1.9400	1.0	71.6964	8.6556
2.2569	33.0	14751	1.9354	1.0	71.7775	8.7164
2.2044	34.0	15198	1.9189	1.0	73.0560	8.8147
2.168	35.0	15645	1.9167	1.0	71.1399	8.8828
2.1329	36.0	16092	1.9045	1.0	71.0777	8.9717
2.0839	37.0	16539	1.9046	1.0	71.6059	8.9904
2.1337	38.0	16986	1.8902	1.0	70.9227	9.0529
2.0959	39.0	17433	1.8842	1.0	70.6274	9.1009
2.0324	40.0	17880	1.8749	1.0	70.5578	9.1943
2.0207	41.0	18327	1.8735	1.0	70.8847	9.2158
1.9929	42.0	18774	1.8672	1.0	71.3545	9.3092
2.029	43.0	19221	1.8667	1.0	71.9536	9.3463
1.9606	44.0	19668	1.8621	1.0	70.7463	9.3929
1.9386	45.0	20115	1.8553	1.0	70.4038	9.4295
1.9425	46.0	20562	1.8481	1.0	70.7218	9.4897
1.963	47.0	21009	1.8429	1.0	71.7010	9.5780
1.9085	48.0	21456	1.8461	1.0	70.6225	9.5692
1.9333	49.0	21903	1.8318	1.0	70.7106	9.6188
1.8609	50.0	22350	1.8385	1.0	71.3281	9.6363

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/a93eba97dfea3ff692f3ee62a5a4873a

Base model

google/umt5-small

Finetuned

(46)

this model