e7662cb3688b6332102b286fa11a90c1

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fi-no] dataset. It achieves the following results on the evaluation set:

Loss: 2.9276
Data Size: 1.0
Epoch Runtime: 14.3075
Bleu: 2.5509

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	24.0511	0	1.8045	0.0015
No log	1	85	23.9223	0.0078	2.0959	0.0015
No log	2	170	23.3746	0.0156	2.4186	0.0016
No log	3	255	21.9056	0.0312	2.9124	0.0009
No log	4	340	19.2017	0.0625	3.6871	0.0010
1.5371	5	425	16.1393	0.125	4.4227	0.0018
1.5371	6	510	12.9371	0.25	5.8443	0.0024
5.1855	7	595	10.3279	0.5	8.3449	0.0061
11.166	8.0	680	6.8698	1.0	13.8622	0.0102
7.2492	9.0	765	5.1304	1.0	13.1637	0.0131
5.4231	10.0	850	3.8645	1.0	13.7229	0.2266
4.9808	11.0	935	3.5982	1.0	13.9799	0.6264
4.5714	12.0	1020	3.4813	1.0	14.6461	0.8784
4.3433	13.0	1105	3.4004	1.0	14.4856	1.1324
4.2422	14.0	1190	3.3477	1.0	14.6385	1.1699
4.1312	15.0	1275	3.3103	1.0	14.6856	1.2580
4.0119	16.0	1360	3.2661	1.0	13.1301	1.3887
3.9601	17.0	1445	3.2384	1.0	14.7256	1.4309
3.8771	18.0	1530	3.2175	1.0	14.9064	1.4782
3.8248	19.0	1615	3.1939	1.0	14.0608	1.5454
3.7387	20.0	1700	3.1690	1.0	14.2272	1.6547
3.7132	21.0	1785	3.1483	1.0	14.2225	1.6540
3.6829	22.0	1870	3.1310	1.0	14.7918	1.7029
3.5991	23.0	1955	3.1136	1.0	12.9903	1.7669
3.5841	24.0	2040	3.0981	1.0	13.7198	1.8153
3.5414	25.0	2125	3.0848	1.0	14.1757	1.8643
3.5293	26.0	2210	3.0739	1.0	14.6572	1.9215
3.4912	27.0	2295	3.0607	1.0	14.1963	2.0124
3.4617	28.0	2380	3.0531	1.0	14.2069	2.0004
3.4066	29.0	2465	3.0395	1.0	14.4273	2.0629
3.3653	30.0	2550	3.0303	1.0	14.9297	2.0936
3.3595	31.0	2635	3.0248	1.0	14.1431	2.0995
3.3621	32.0	2720	3.0148	1.0	14.2065	2.1116
3.297	33.0	2805	3.0049	1.0	14.2264	2.1468
3.3074	34.0	2890	3.0002	1.0	14.2656	2.1885
3.2554	35.0	2975	2.9919	1.0	14.7194	2.2099
3.2154	36.0	3060	2.9869	1.0	15.0132	2.2706
3.22	37.0	3145	2.9779	1.0	14.7813	2.3502
3.19	38.0	3230	2.9756	1.0	14.9994	2.3576
3.1868	39.0	3315	2.9680	1.0	13.8964	2.4244
3.1625	40.0	3400	2.9626	1.0	14.5477	2.4152
3.1328	41.0	3485	2.9590	1.0	14.5731	2.4350
3.0821	42.0	3570	2.9519	1.0	14.7370	2.4602
3.0939	43.0	3655	2.9509	1.0	14.6663	2.4102
3.0574	44.0	3740	2.9506	1.0	14.7435	2.4589
3.059	45.0	3825	2.9447	1.0	15.4199	2.5004
3.0281	46.0	3910	2.9405	1.0	15.5515	2.5317
2.9901	47.0	3995	2.9367	1.0	13.5553	2.5128
2.9802	48.0	4080	2.9347	1.0	13.5999	2.5122
2.9488	49.0	4165	2.9301	1.0	14.0084	2.5872
2.9433	50.0	4250	2.9276	1.0	14.3075	2.5509

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/e7662cb3688b6332102b286fa11a90c1

Base model

google/mt5-small

Finetuned

(667)

this model