4d4d7cb2f1bfc96c25e98c54d0997d8a

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [en-es] dataset. It achieves the following results on the evaluation set:

Loss: 1.8516
Data Size: 1.0
Epoch Runtime: 327.4974
Bleu: 8.2038

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	26.2854	0	27.9145	0.0017
No log	1	2336	18.9281	0.0078	32.4046	0.0018
0.3263	2	4672	11.7900	0.0156	34.2716	0.0036
0.3274	3	7008	6.1686	0.0312	39.1650	0.0218
5.361	4	9344	3.6553	0.0625	49.7337	0.0868
4.2979	5	11680	3.1433	0.125	68.3465	1.1067
3.7824	6	14016	2.9144	0.25	106.6005	1.7041
3.4797	7	16352	2.7169	0.5	183.0667	2.3146
3.1554	8.0	18688	2.5379	1.0	336.3692	3.7288
2.9481	9.0	21024	2.4301	1.0	335.9351	4.3518
2.8347	10.0	23360	2.3613	1.0	328.1369	4.9596
2.756	11.0	25696	2.3131	1.0	327.7411	5.2984
2.6544	12.0	28032	2.2728	1.0	329.1862	5.5852
2.5893	13.0	30368	2.2285	1.0	330.7622	5.7019
2.5538	14.0	32704	2.1961	1.0	330.6755	5.7741
2.4931	15.0	35040	2.1711	1.0	335.6781	6.0698
2.4965	16.0	37376	2.1462	1.0	333.3473	6.1122
2.4119	17.0	39712	2.1221	1.0	333.6272	6.2744
2.4059	18.0	42048	2.1039	1.0	332.4366	6.4836
2.3692	19.0	44384	2.0904	1.0	329.6484	6.5465
2.3331	20.0	46720	2.0651	1.0	330.2129	6.7108
2.2888	21.0	49056	2.0572	1.0	332.0176	6.7480
2.2682	22.0	51392	2.0413	1.0	332.2873	6.7908
2.2431	23.0	53728	2.0299	1.0	330.3732	6.9255
2.1991	24.0	56064	2.0156	1.0	331.4767	7.0141
2.2014	25.0	58400	2.0036	1.0	330.0735	7.1629
2.1636	26.0	60736	1.9884	1.0	329.7091	7.2376
2.1585	27.0	63072	1.9843	1.0	331.1367	7.1811
2.1246	28.0	65408	1.9690	1.0	332.0541	7.2945
2.1024	29.0	67744	1.9667	1.0	329.9478	7.4214
2.0918	30.0	70080	1.9618	1.0	332.9395	7.4096
2.0791	31.0	72416	1.9488	1.0	336.1133	7.5346
2.0916	32.0	74752	1.9397	1.0	330.8274	7.5941
2.0439	33.0	77088	1.9333	1.0	330.4461	7.5373
2.0485	34.0	79424	1.9321	1.0	329.3917	7.6020
2.0266	35.0	81760	1.9202	1.0	331.9756	7.7666
2.0335	36.0	84096	1.9151	1.0	329.3808	7.8134
1.9908	37.0	86432	1.9068	1.0	331.5035	7.7518
2.0021	38.0	88768	1.9069	1.0	334.6303	7.8101
2.0001	39.0	91104	1.8963	1.0	328.9622	7.8611
1.9735	40.0	93440	1.8955	1.0	328.4680	7.8629
1.9618	41.0	95776	1.8912	1.0	336.6663	7.9950
1.9276	42.0	98112	1.8850	1.0	328.1633	7.9522
1.9324	43.0	100448	1.8807	1.0	331.0138	7.9116
1.8694	44.0	102784	1.8756	1.0	330.6471	8.0294
1.9165	45.0	105120	1.8682	1.0	331.4836	8.1219
1.8714	46.0	107456	1.8655	1.0	329.9200	8.1032
1.8841	47.0	109792	1.8599	1.0	326.8646	8.1485
1.855	48.0	112128	1.8633	1.0	331.9095	8.1450
1.8642	49.0	114464	1.8543	1.0	331.2229	8.1218
1.8601	50.0	116800	1.8516	1.0	327.4974	8.2038

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 4

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/4d4d7cb2f1bfc96c25e98c54d0997d8a

Base model

google/mt5-small

Finetuned

(682)

this model