4613e52fc5486ec23a4cb323d2180803

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [en-fi] dataset. It achieves the following results on the evaluation set:

Loss: 3.0263
Data Size: 1.0
Epoch Runtime: 15.6265
Bleu: 2.6435

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	26.8070	0	1.8649	0.0019
No log	1	91	26.6464	0.0078	2.9198	0.0021
No log	2	182	25.2832	0.0156	2.7108	0.0024
No log	3	273	23.9219	0.0312	3.1279	0.0022
No log	4	364	22.2360	0.0625	3.4430	0.0025
No log	5	455	18.5403	0.125	4.3440	0.0030
No log	6	546	14.9720	0.25	6.0982	0.0038
2.1573	7	637	10.0920	0.5	9.4463	0.0057
10.8751	8.0	728	6.6132	1.0	16.2321	0.0266
7.1163	9.0	819	4.4833	1.0	14.2330	0.0333
5.5747	10.0	910	4.0264	1.0	14.7734	0.4297
5.0514	11.0	1001	3.8025	1.0	15.0448	0.6325
4.8985	12.0	1092	3.6709	1.0	15.4207	0.8203
4.6751	13.0	1183	3.5885	1.0	15.1690	1.1198
4.5114	14.0	1274	3.5234	1.0	15.3498	1.3589
4.3597	15.0	1365	3.4709	1.0	14.0717	1.4570
4.2411	16.0	1456	3.4312	1.0	14.1590	1.5430
4.1973	17.0	1547	3.3930	1.0	15.4894	1.5711
4.1391	18.0	1638	3.3629	1.0	14.7562	1.6843
4.0203	19.0	1729	3.3375	1.0	14.6328	1.7110
3.9747	20.0	1820	3.3116	1.0	14.5719	1.7470
3.9412	21.0	1911	3.2892	1.0	15.0229	1.8933
3.861	22.0	2002	3.2745	1.0	14.2382	1.9115
3.861	23.0	2093	3.2472	1.0	14.5442	1.9207
3.7894	24.0	2184	3.2347	1.0	14.7812	2.0149
3.7753	25.0	2275	3.2174	1.0	15.1183	2.0566
3.7182	26.0	2366	3.2017	1.0	14.5415	2.0655
3.7048	27.0	2457	3.1873	1.0	14.6949	2.1741
3.6486	28.0	2548	3.1744	1.0	14.9361	2.1382
3.6047	29.0	2639	3.1601	1.0	15.7903	2.1818
3.5836	30.0	2730	3.1512	1.0	14.1550	2.2143
3.5298	31.0	2821	3.1389	1.0	14.6037	2.2481
3.524	32.0	2912	3.1312	1.0	14.4274	2.2530
3.4796	33.0	3003	3.1226	1.0	15.4424	2.2744
3.4087	34.0	3094	3.1153	1.0	15.5786	2.3144
3.4386	35.0	3185	3.1058	1.0	15.4120	2.2947
3.3912	36.0	3276	3.0996	1.0	15.5306	2.2894
3.3854	37.0	3367	3.0888	1.0	14.0790	2.3126
3.3341	38.0	3458	3.0861	1.0	14.5252	2.3687
3.3062	39.0	3549	3.0738	1.0	14.3299	2.3899
3.3033	40.0	3640	3.0772	1.0	14.8870	2.4320
3.2496	41.0	3731	3.0690	1.0	15.3802	2.4647
3.2566	42.0	3822	3.0659	1.0	15.5431	2.4617
3.2492	43.0	3913	3.0568	1.0	15.3644	2.5582
3.2006	44.0	4004	3.0567	1.0	15.2988	2.5684
3.189	45.0	4095	3.0512	1.0	14.0501	2.6085
3.1572	46.0	4186	3.0428	1.0	14.6061	2.5761
3.1075	47.0	4277	3.0340	1.0	15.0633	2.5948
3.1011	48.0	4368	3.0314	1.0	15.0853	2.6284
3.0978	49.0	4459	3.0329	1.0	15.1390	2.5948
3.0689	50.0	4550	3.0263	1.0	15.6265	2.6435

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/4613e52fc5486ec23a4cb323d2180803

Base model

google/mt5-small

Finetuned

(666)

this model