efa1a362d0a41e141591c0a81db31a63

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fr-ru] dataset. It achieves the following results on the evaluation set:

Loss: 2.5158
Data Size: 1.0
Epoch Runtime: 30.4151
Bleu: 3.5464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	27.8653	0	3.2901	0.0054
No log	1	204	27.8858	0.0078	3.5448	0.0052
No log	2	408	25.7717	0.0156	4.5152	0.0062
No log	3	612	24.1957	0.0312	5.0243	0.0057
No log	4	816	18.9655	0.0625	5.8786	0.0074
No log	5	1020	12.8708	0.125	7.4486	0.0133
1.54	6	1224	8.4704	0.25	10.8623	0.0169
8.8986	7	1428	5.0593	0.5	17.2985	0.0230
5.0016	8.0	1632	3.5442	1.0	30.8860	0.3609
4.4604	9.0	1836	3.2882	1.0	31.3445	0.7791
4.2289	10.0	2040	3.1574	1.0	30.3314	1.0805
4.0285	11.0	2244	3.0705	1.0	30.9519	1.2622
3.8953	12.0	2448	3.0100	1.0	31.2718	1.3434
3.7873	13.0	2652	2.9589	1.0	30.4523	1.5086
3.7034	14.0	2856	2.9106	1.0	30.2343	1.6267
3.6363	15.0	3060	2.8808	1.0	30.6904	1.7854
3.5499	16.0	3264	2.8463	1.0	31.3114	1.8462
3.4958	17.0	3468	2.8198	1.0	30.8767	1.9228
3.493	18.0	3672	2.7970	1.0	31.2220	1.9446
3.4138	19.0	3876	2.7729	1.0	31.0873	2.0582
3.3651	20.0	4080	2.7533	1.0	31.4935	2.1085
3.3197	21.0	4284	2.7403	1.0	32.4768	2.1687
3.2594	22.0	4488	2.7233	1.0	31.7985	2.2121
3.2295	23.0	4692	2.7061	1.0	31.3834	2.2961
3.1994	24.0	4896	2.6899	1.0	30.6403	2.3484
3.1855	25.0	5100	2.6790	1.0	31.0177	2.3975
3.1176	26.0	5304	2.6684	1.0	31.9839	2.4435
3.0866	27.0	5508	2.6551	1.0	30.7518	2.4625
3.0626	28.0	5712	2.6443	1.0	30.7101	2.5293
3.012	29.0	5916	2.6326	1.0	30.4698	2.5482
3.0006	30.0	6120	2.6236	1.0	31.1037	2.6454
2.9874	31.0	6324	2.6130	1.0	30.9122	2.6729
2.9616	32.0	6528	2.6078	1.0	30.9464	2.7181
2.9164	33.0	6732	2.6001	1.0	30.4612	2.7440
2.897	34.0	6936	2.5936	1.0	31.0837	2.8153
2.8612	35.0	7140	2.5853	1.0	32.5358	2.8709
2.8367	36.0	7344	2.5835	1.0	31.5089	2.8604
2.8082	37.0	7548	2.5729	1.0	31.9901	2.8850
2.8487	38.0	7752	2.5649	1.0	30.2207	2.9948
2.7835	39.0	7956	2.5607	1.0	30.4122	2.9975
2.7553	40.0	8160	2.5620	1.0	30.3458	3.0631
2.7686	41.0	8364	2.5487	1.0	31.2387	3.1216
2.7392	42.0	8568	2.5425	1.0	30.2285	3.1314
2.6999	43.0	8772	2.5431	1.0	31.0260	3.2077
2.6887	44.0	8976	2.5345	1.0	31.8682	3.2224
2.67	45.0	9180	2.5335	1.0	31.6668	3.2431
2.6897	46.0	9384	2.5211	1.0	30.2110	3.2613
2.6501	47.0	9588	2.5234	1.0	30.1906	3.3322
2.6273	48.0	9792	2.5212	1.0	31.1244	3.4368
2.6031	49.0	9996	2.5173	1.0	31.4978	3.4740
2.5725	50.0	10200	2.5158	1.0	30.4151	3.5464

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/efa1a362d0a41e141591c0a81db31a63

Base model

google/mt5-small

Finetuned

(666)

this model