acbacdfa50c20b11ca4810175f066222

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [en-es] dataset. It achieves the following results on the evaluation set:

Loss: 2.2036
Data Size: 1.0
Epoch Runtime: 369.2799
Bleu: 7.6026

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	16.1008	0	30.7191	0.1376
No log	1	2336	15.4111	0.0078	33.9916	0.1382
0.2628	2	4672	11.6741	0.0156	36.9915	0.1409
0.3107	3	7008	7.5072	0.0312	42.3120	0.2862
6.9196	4	9344	4.6997	0.0625	53.1233	1.4737
5.3873	5	11680	3.9288	0.125	73.4840	2.1529
4.487	6	14016	3.4148	0.25	116.9824	2.5966
4.0544	7	16352	3.1414	0.5	199.2163	3.2885
3.6842	8.0	18688	2.9427	1.0	364.7632	4.0592
3.4452	9.0	21024	2.8255	1.0	364.2480	4.5240
3.3093	10.0	23360	2.7579	1.0	363.0636	4.8231
3.2422	11.0	25696	2.7009	1.0	364.7663	5.0897
3.1326	12.0	28032	2.6510	1.0	364.8785	5.3149
3.0436	13.0	30368	2.6195	1.0	362.5315	5.4792
3.0106	14.0	32704	2.5788	1.0	362.0513	5.6407
2.9365	15.0	35040	2.5521	1.0	362.4158	5.7836
2.9626	16.0	37376	2.5179	1.0	362.7865	5.9337
2.8353	17.0	39712	2.4982	1.0	365.3876	6.0555
2.8446	18.0	42048	2.4835	1.0	364.7898	6.1210
2.8111	19.0	44384	2.4641	1.0	371.6591	6.2001
2.7674	20.0	46720	2.4445	1.0	383.2359	6.2968
2.7096	21.0	49056	2.4256	1.0	385.3468	6.3968
2.6922	22.0	51392	2.4156	1.0	385.5504	6.4567
2.652	23.0	53728	2.3956	1.0	384.0397	6.5348
2.5982	24.0	56064	2.3923	1.0	381.7050	6.5862
2.6054	25.0	58400	2.3686	1.0	386.3455	6.6673
2.5553	26.0	60736	2.3560	1.0	383.9699	6.7479
2.5704	27.0	63072	2.3517	1.0	379.6998	6.7864
2.5291	28.0	65408	2.3349	1.0	385.1407	6.8487
2.495	29.0	67744	2.3311	1.0	380.7480	6.8744
2.5026	30.0	70080	2.3220	1.0	367.5770	6.9267
2.4608	31.0	72416	2.3145	1.0	363.9416	6.9729
2.4994	32.0	74752	2.2948	1.0	366.9948	7.0238
2.4301	33.0	77088	2.2908	1.0	366.0870	7.0714
2.4414	34.0	79424	2.2933	1.0	366.1123	7.1139
2.4087	35.0	81760	2.2787	1.0	364.9032	7.1458
2.4174	36.0	84096	2.2691	1.0	364.1023	7.1781
2.3635	37.0	86432	2.2636	1.0	362.4927	7.2447
2.3803	38.0	88768	2.2673	1.0	365.5410	7.2437
2.3771	39.0	91104	2.2526	1.0	366.7453	7.2992
2.365	40.0	93440	2.2541	1.0	364.3133	7.3154
2.339	41.0	95776	2.2431	1.0	366.3519	7.3661
2.3013	42.0	98112	2.2383	1.0	370.7902	7.3864
2.2845	43.0	100448	2.2416	1.0	368.4315	7.4176
2.2383	44.0	102784	2.2310	1.0	365.7991	7.4575
2.2968	45.0	105120	2.2163	1.0	368.7937	7.4876
2.2331	46.0	107456	2.2186	1.0	376.4598	7.5047
2.2592	47.0	109792	2.2192	1.0	368.2389	7.5199
2.2147	48.0	112128	2.2200	1.0	370.3951	7.5408
2.2242	49.0	114464	2.2059	1.0	373.3496	7.5755
2.2101	50.0	116800	2.2036	1.0	369.2799	7.6026

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/acbacdfa50c20b11ca4810175f066222

Base model

google/umt5-small

Finetuned

(45)

this model