520b14835eced208a78ef9e8f2f99d7a

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [fr-it] dataset. It achieves the following results on the evaluation set:

Loss: 2.7937
Data Size: 1.0
Epoch Runtime: 58.5370
Bleu: 5.6361

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	17.0743	0	5.5982	0.2047
No log	1	367	16.4893	0.0078	6.0438	0.2076
No log	2	734	15.6521	0.0156	6.8942	0.2252
No log	3	1101	14.1151	0.0312	8.2651	0.2289
No log	4	1468	11.6662	0.0625	9.6650	0.2161
0.7599	5	1835	8.2824	0.125	12.6107	0.2983
8.6202	6	2202	5.2964	0.25	19.1086	0.7094
6.2023	7	2569	4.3212	0.5	31.9172	2.9934
5.0974	8.0	2936	3.8220	1.0	57.6145	2.1554
4.6036	9.0	3303	3.5523	1.0	57.8272	2.6853
4.3248	10.0	3670	3.4210	1.0	59.7893	3.0457
4.1866	11.0	4037	3.3420	1.0	58.6285	3.2571
4.0657	12.0	4404	3.2866	1.0	57.6160	3.4933
4.0059	13.0	4771	3.2401	1.0	58.4404	3.6619
3.8718	14.0	5138	3.2015	1.0	57.9625	3.8115
3.8261	15.0	5505	3.1692	1.0	58.4455	3.9516
3.7285	16.0	5872	3.1352	1.0	59.0941	4.0884
3.6943	17.0	6239	3.1219	1.0	58.3599	4.1323
3.6541	18.0	6606	3.0931	1.0	59.2689	4.2474
3.6291	19.0	6973	3.0716	1.0	59.4495	4.3364
3.5636	20.0	7340	3.0412	1.0	58.0654	4.4187
3.5061	21.0	7707	3.0389	1.0	57.7992	4.4748
3.4734	22.0	8074	3.0219	1.0	57.9885	4.5529
3.4102	23.0	8441	3.0044	1.0	58.9918	4.6240
3.3814	24.0	8808	2.9803	1.0	59.7393	4.7050
3.3919	25.0	9175	2.9830	1.0	58.7152	4.7763
3.2983	26.0	9542	2.9674	1.0	58.7490	4.8029
3.2863	27.0	9909	2.9622	1.0	59.5547	4.8354
3.2594	28.0	10276	2.9418	1.0	59.9988	4.8734
3.2504	29.0	10643	2.9263	1.0	59.5368	4.9443
3.2417	30.0	11010	2.9301	1.0	58.6146	4.9528
3.1811	31.0	11377	2.9141	1.0	58.3594	5.0152
3.1766	32.0	11744	2.9059	1.0	58.9723	5.0389
3.1313	33.0	12111	2.8911	1.0	58.4011	5.0914
3.1139	34.0	12478	2.8956	1.0	59.8377	5.1396
3.0672	35.0	12845	2.8871	1.0	59.5639	5.1947
3.0823	36.0	13212	2.8757	1.0	58.4524	5.2140
3.049	37.0	13579	2.8646	1.0	58.3569	5.2484
3.0445	38.0	13946	2.8729	1.0	58.3784	5.3178
3.0336	39.0	14313	2.8476	1.0	59.2942	5.3525
2.9965	40.0	14680	2.8526	1.0	58.9237	5.3645
2.9969	41.0	15047	2.8385	1.0	58.7512	5.3852
2.9535	42.0	15414	2.8423	1.0	58.8361	5.4229
2.932	43.0	15781	2.8336	1.0	58.5391	5.4622
2.923	44.0	16148	2.8279	1.0	59.8299	5.4951
2.9285	45.0	16515	2.8244	1.0	59.5493	5.5138
2.9258	46.0	16882	2.8144	1.0	58.9045	5.5251
2.8831	47.0	17249	2.8164	1.0	59.3035	5.5434
2.8739	48.0	17616	2.8114	1.0	58.5114	5.6106
2.8258	49.0	17983	2.8138	1.0	60.7856	5.6095
2.8659	50.0	18350	2.7937	1.0	58.5370	5.6361

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/520b14835eced208a78ef9e8f2f99d7a

Base model

google/umt5-small

Finetuned

(45)

this model