452912cb71a3117bd0e57d0415d00578

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [it-nl] dataset. It achieves the following results on the evaluation set:

Loss: 3.4272
Data Size: 1.0
Epoch Runtime: 10.4052
Bleu: 2.4409

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	14.6493	0	1.6677	0.0326
No log	1	58	14.5310	0.0078	2.1460	0.0343
No log	2	116	14.3887	0.0156	2.6500	0.0316
No log	3	174	14.0886	0.0312	3.1482	0.0355
No log	4	232	13.6743	0.0625	3.6247	0.0326
No log	5	290	12.8903	0.125	4.5228	0.0344
1.4522	6	348	11.3777	0.25	5.7006	0.0411
1.984	7	406	8.3046	0.5	7.8283	0.0696
7.3815	8.0	464	6.3708	1.0	12.8800	0.1886
7.3864	9.0	522	5.1439	1.0	12.3850	0.7054
6.5349	10.0	580	4.6842	1.0	13.1245	1.2822
5.9802	11.0	638	4.4774	1.0	10.3956	1.5697
5.6194	12.0	696	4.3456	1.0	11.6056	1.9396
5.2318	13.0	754	4.2309	1.0	11.3041	2.2101
5.0925	14.0	812	4.1525	1.0	11.2231	2.4791
4.9603	15.0	870	4.0755	1.0	11.2407	2.3625
4.8762	16.0	928	4.0107	1.0	11.2251	1.1962
4.8013	17.0	986	3.9550	1.0	11.3481	1.2647
4.7091	18.0	1044	3.9082	1.0	11.7436	1.2701
4.5885	19.0	1102	3.8637	1.0	12.4707	1.3501
4.5338	20.0	1160	3.8302	1.0	11.0649	1.3590
4.483	21.0	1218	3.7975	1.0	11.3822	1.4467
4.4465	22.0	1276	3.7642	1.0	11.4813	1.5215
4.4019	23.0	1334	3.7429	1.0	12.6424	1.5879
4.3627	24.0	1392	3.7166	1.0	12.2501	1.6346
4.2885	25.0	1450	3.6970	1.0	12.2352	1.6814
4.264	26.0	1508	3.6770	1.0	12.2031	1.6437
4.2211	27.0	1566	3.6579	1.0	12.1398	1.6338
4.2011	28.0	1624	3.6384	1.0	14.2453	1.7182
4.1774	29.0	1682	3.6227	1.0	10.7917	1.7401
4.1455	30.0	1740	3.6118	1.0	10.7680	1.8388
4.1121	31.0	1798	3.5978	1.0	10.9660	1.8051
4.0668	32.0	1856	3.5818	1.0	10.9086	1.8678
4.0435	33.0	1914	3.5727	1.0	10.7405	1.9157
4.0312	34.0	1972	3.5617	1.0	11.6127	1.9140
3.9924	35.0	2030	3.5475	1.0	11.7059	2.0078
3.9725	36.0	2088	3.5382	1.0	11.3898	2.0917
3.9606	37.0	2146	3.5261	1.0	11.2871	2.0731
3.9178	38.0	2204	3.5179	1.0	12.1370	2.2044
3.8868	39.0	2262	3.5112	1.0	10.5894	2.1897
3.9015	40.0	2320	3.4991	1.0	10.9718	2.2482
3.8706	41.0	2378	3.4914	1.0	11.1840	2.2535
3.8446	42.0	2436	3.4867	1.0	11.2884	2.2190
3.8325	43.0	2494	3.4798	1.0	11.9500	2.2815
3.805	44.0	2552	3.4741	1.0	11.8427	2.2715
3.7734	45.0	2610	3.4619	1.0	12.1536	2.2864
3.7753	46.0	2668	3.4584	1.0	13.4007	2.3996
3.7515	47.0	2726	3.4531	1.0	13.0900	2.3798
3.7515	48.0	2784	3.4402	1.0	13.0615	2.4263
3.7215	49.0	2842	3.4357	1.0	10.5903	2.4165
3.6864	50.0	2900	3.4272	1.0	10.4052	2.4409

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/452912cb71a3117bd0e57d0415d00578

Base model

google/umt5-small

Finetuned

(45)

this model