53a0590d75db293170dd4d8b60a5d42f

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [es-ru] dataset. It achieves the following results on the evaluation set:

Loss: 2.2613
Data Size: 1.0
Epoch Runtime: 66.5602
Bleu: 4.2831

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	26.0882	0	5.7279	0.0038
No log	1	419	23.4322	0.0078	6.3451	0.0044
No log	2	838	19.9518	0.0156	6.9912	0.0045
0.6263	3	1257	14.4388	0.0312	8.2809	0.0056
0.6263	4	1676	10.3959	0.0625	10.2136	0.0076
0.9689	5	2095	6.8759	0.125	13.8411	0.0135
0.9182	6	2514	4.4206	0.25	21.4609	0.0131
4.5654	7	2933	3.2744	0.5	34.5283	0.8224
3.8563	8.0	3352	2.9757	1.0	62.8554	1.2310
3.6633	9.0	3771	2.8524	1.0	61.6147	1.4882
3.5277	10.0	4190	2.7693	1.0	64.6321	1.7471
3.3524	11.0	4609	2.7129	1.0	62.0201	1.9156
3.312	12.0	5028	2.6630	1.0	62.0960	2.0774
3.2036	13.0	5447	2.6264	1.0	63.2139	2.2092
3.195	14.0	5866	2.5942	1.0	63.6137	2.3323
3.0695	15.0	6285	2.5639	1.0	66.2644	2.4058
3.0676	16.0	6704	2.5396	1.0	63.6661	2.4857
3.0128	17.0	7123	2.5184	1.0	63.6598	2.5677
2.9509	18.0	7542	2.4970	1.0	63.0249	2.6038
2.919	19.0	7961	2.4749	1.0	63.8457	2.7655
2.8584	20.0	8380	2.4589	1.0	62.7789	2.8061
2.8583	21.0	8799	2.4449	1.0	63.3639	2.8779
2.8146	22.0	9218	2.4299	1.0	62.3622	2.9702
2.8195	23.0	9637	2.4227	1.0	63.2174	2.9814
2.7433	24.0	10056	2.4108	1.0	63.2408	3.0786
2.7329	25.0	10475	2.4014	1.0	63.0987	3.1725
2.685	26.0	10894	2.3860	1.0	61.7609	3.2540
2.683	27.0	11313	2.3750	1.0	63.5046	3.3158
2.6763	28.0	11732	2.3670	1.0	63.4249	3.4072
2.6408	29.0	12151	2.3599	1.0	62.8431	3.4535
2.6533	30.0	12570	2.3509	1.0	66.6050	3.4628
2.5966	31.0	12989	2.3453	1.0	66.0938	3.5460
2.5748	32.0	13408	2.3354	1.0	66.2647	3.5959
2.5456	33.0	13827	2.3327	1.0	64.5781	3.6653
2.5099	34.0	14246	2.3268	1.0	64.9895	3.6784
2.5427	35.0	14665	2.3130	1.0	64.4841	3.7323
2.4921	36.0	15084	2.3115	1.0	64.6163	3.8141
2.4378	37.0	15503	2.3093	1.0	65.5945	3.8321
2.4358	38.0	15922	2.3043	1.0	65.0992	3.9432
2.435	39.0	16341	2.2956	1.0	65.5985	3.9072
2.4036	40.0	16760	2.2981	1.0	65.2673	3.9699
2.4083	41.0	17179	2.2889	1.0	66.2053	3.9823
2.3789	42.0	17598	2.2859	1.0	65.5683	4.0351
2.391	43.0	18017	2.2806	1.0	66.7639	4.2193
2.3584	44.0	18436	2.2779	1.0	65.9479	4.0464
2.3229	45.0	18855	2.2772	1.0	65.8073	4.1574
2.3252	46.0	19274	2.2743	1.0	66.2474	4.1240
2.3353	47.0	19693	2.2706	1.0	65.2274	4.2012
2.2846	48.0	20112	2.2657	1.0	66.5786	4.1928
2.309	49.0	20531	2.2618	1.0	65.3962	4.2712
2.2756	50.0	20950	2.2613	1.0	66.5602	4.2831

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/53a0590d75db293170dd4d8b60a5d42f

Base model

google/mt5-small

Finetuned

(753)

this model