2b0c94697808ec3de2d5f1b05bf05849

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [de-pt] dataset. It achieves the following results on the evaluation set:

Loss: 3.1001
Data Size: 1.0
Epoch Runtime: 6.2271
Bleu: 4.1387

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	16.2689	0	1.1554	0.4356
No log	1	27	15.9789	0.0078	1.4234	0.3459
No log	2	54	15.8901	0.0156	1.8919	0.3310
No log	3	81	15.7364	0.0312	2.2802	0.5092
No log	4	108	15.5191	0.0625	2.2502	0.3992
No log	5	135	15.1460	0.125	2.5648	0.3839
No log	6	162	14.4777	0.25	3.2894	0.3199
No log	7	189	13.1401	0.5	4.6359	0.3334
3.644	8.0	216	11.0998	1.0	7.0281	0.3472
3.644	9.0	243	9.3550	1.0	6.8696	0.3811
13.0011	10.0	270	8.0799	1.0	6.7935	0.3923
13.0011	11.0	297	7.1932	1.0	7.0847	0.2627
9.796	12.0	324	6.5080	1.0	7.0138	0.3121
7.9769	13.0	351	5.7633	1.0	7.6384	0.6407
7.9769	14.0	378	5.0105	1.0	7.3221	0.9045
6.9403	15.0	405	4.7115	1.0	5.4665	1.2322
6.9403	16.0	432	4.5795	1.0	5.9862	2.2563
6.2452	17.0	459	4.4623	1.0	6.2203	2.7302
6.2452	18.0	486	4.3420	1.0	6.2215	3.2068
5.7946	19.0	513	4.2593	1.0	6.3992	3.2921
5.7946	20.0	540	4.1774	1.0	7.2450	3.2180
5.491	21.0	567	4.0912	1.0	6.7042	3.3344
5.491	22.0	594	3.9983	1.0	6.7033	1.6987
5.2207	23.0	621	3.9346	1.0	7.0602	1.3817
5.2207	24.0	648	3.8750	1.0	7.1286	1.3053
5.0132	25.0	675	3.8021	1.0	7.0340	1.3828
4.8139	26.0	702	3.7475	1.0	7.4569	1.4485
4.8139	27.0	729	3.6965	1.0	7.4954	1.5240
4.6791	28.0	756	3.6504	1.0	7.3945	1.3944
4.6791	29.0	783	3.5977	1.0	7.9859	1.1471
4.524	30.0	810	3.5524	1.0	5.7725	1.0695
4.524	31.0	837	3.5003	1.0	6.2132	1.0906
4.3751	32.0	864	3.4662	1.0	6.2070	1.1368
4.3751	33.0	891	3.4301	1.0	6.0684	1.1827
4.2701	34.0	918	3.3905	1.0	6.2441	1.2379
4.2701	35.0	945	3.3597	1.0	6.5993	1.2587
4.1623	36.0	972	3.3262	1.0	6.9220	1.4276
4.1623	37.0	999	3.3008	1.0	6.9956	4.1191
4.0752	38.0	1026	3.2733	1.0	6.9088	4.6422
3.9817	39.0	1053	3.2551	1.0	7.2268	4.2248
3.9817	40.0	1080	3.2284	1.0	7.5810	3.8563
3.9155	41.0	1107	3.2186	1.0	7.8967	3.9903
3.9155	42.0	1134	3.2020	1.0	7.5043	3.8938
3.8444	43.0	1161	3.1908	1.0	7.5884	3.9440
3.8444	44.0	1188	3.1702	1.0	8.4088	3.9587
3.7853	45.0	1215	3.1521	1.0	5.7945	4.0313
3.7853	46.0	1242	3.1432	1.0	5.7534	4.0525
3.7083	47.0	1269	3.1282	1.0	5.7461	4.0782
3.7083	48.0	1296	3.1237	1.0	5.8734	4.1569
3.6513	49.0	1323	3.1040	1.0	6.1703	4.1645
3.5786	50.0	1350	3.1001	1.0	6.2271	4.1387

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/2b0c94697808ec3de2d5f1b05bf05849

Base model

google/umt5-small

Finetuned

(45)

this model