ddd4a6aa06d6ae42631b8799ef97d92a

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fr-pl] dataset. It achieves the following results on the evaluation set:

Loss: 3.4500
Data Size: 1.0
Epoch Runtime: 12.7105
Bleu: 1.3591

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	18.5972	0	1.6580	0.3208
No log	1	70	18.2067	0.0078	2.1820	0.2932
No log	2	140	17.7177	0.0156	2.3440	0.3019
No log	3	210	17.1370	0.0312	2.6829	0.2702
No log	4	280	16.4891	0.0625	3.2230	0.2946
No log	5	350	15.3033	0.125	4.1663	0.2479
No log	6	420	13.6401	0.25	5.1414	0.2967
2.6129	7	490	11.3079	0.5	8.0574	0.4094
11.6429	8.0	560	7.8088	1.0	13.5010	0.3318
9.3515	9.0	630	5.9276	1.0	13.4543	0.4188
7.0719	10.0	700	5.0844	1.0	11.6559	0.4961
6.5733	11.0	770	4.8212	1.0	12.9264	0.6565
6.2979	12.0	840	4.6587	1.0	12.6095	0.7886
5.8217	13.0	910	4.4995	1.0	12.7657	0.9692
5.6473	14.0	980	4.3470	1.0	12.7114	1.1158
5.3408	15.0	1050	4.2188	1.0	13.2125	1.2465
5.2398	16.0	1120	4.0965	1.0	13.1490	0.7502
5.0955	17.0	1190	4.0011	1.0	13.5470	0.5956
4.8827	18.0	1260	3.8982	1.0	11.7615	0.8255
4.8067	19.0	1330	3.8328	1.0	12.2196	0.8645
4.6766	20.0	1400	3.7824	1.0	12.1924	0.8885
4.6094	21.0	1470	3.7223	1.0	12.4951	0.9598
4.5736	22.0	1540	3.6905	1.0	12.6724	0.9687
4.4444	23.0	1610	3.6601	1.0	12.8187	0.9986
4.3824	24.0	1680	3.6374	1.0	13.2163	1.0042
4.3578	25.0	1750	3.6156	1.0	13.1140	1.0644
4.2655	26.0	1820	3.5985	1.0	11.9219	1.0492
4.252	27.0	1890	3.5755	1.0	12.3008	1.0693
4.1737	28.0	1960	3.5686	1.0	12.7413	1.0925
4.1461	29.0	2030	3.5534	1.0	12.6865	1.0772
4.0841	30.0	2100	3.5476	1.0	12.8186	1.1322
4.0576	31.0	2170	3.5374	1.0	13.5435	1.2023
4.0192	32.0	2240	3.5235	1.0	13.4343	1.1625
3.9789	33.0	2310	3.5156	1.0	13.1692	1.1780
3.9396	34.0	2380	3.5059	1.0	13.4867	1.1580
3.9005	35.0	2450	3.4989	1.0	12.1223	1.2033
3.8795	36.0	2520	3.4927	1.0	11.9539	1.2470
3.851	37.0	2590	3.4914	1.0	12.0542	1.2446
3.8158	38.0	2660	3.4846	1.0	12.3248	1.1963
3.7645	39.0	2730	3.4799	1.0	12.3383	1.2744
3.7962	40.0	2800	3.4729	1.0	12.5361	1.2342
3.7486	41.0	2870	3.4722	1.0	13.2700	1.2756
3.7166	42.0	2940	3.4666	1.0	12.9179	1.3236
3.685	43.0	3010	3.4647	1.0	12.8274	1.3277
3.6813	44.0	3080	3.4617	1.0	11.8841	1.3205
3.6468	45.0	3150	3.4584	1.0	12.3643	1.3308
3.6138	46.0	3220	3.4559	1.0	12.4019	1.3099
3.5968	47.0	3290	3.4511	1.0	12.3915	1.4182
3.5644	48.0	3360	3.4496	1.0	12.8932	1.4216
3.5602	49.0	3430	3.4485	1.0	12.8400	1.3770
3.5279	50.0	3500	3.4500	1.0	12.7105	1.3591

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/ddd4a6aa06d6ae42631b8799ef97d92a

Base model

google/umt5-small

Finetuned

(45)

this model