c591ac3fb982261b29782cd25fe3c5a2

This model is a fine-tuned version of google/umt5-small on the Helsinki-NLP/opus_books [es-fr] dataset. It achieves the following results on the evaluation set:

Loss: 1.7078
Data Size: 1.0
Epoch Runtime: 216.9740
Bleu: 12.6009

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	14.9661	0	18.4328	0.1698
No log	1	1407	13.8678	0.0078	21.2159	0.1976
No log	2	2814	12.1745	0.0156	21.7496	0.1463
0.3826	3	4221	10.0626	0.0312	25.6574	0.1737
9.4091	4	5628	6.0042	0.0625	31.2974	0.4278
5.7989	5	7035	3.7447	0.125	43.6643	4.9009
4.5277	6	8442	3.1458	0.25	69.2407	3.8997
3.7289	7	9849	2.6678	0.5	117.4917	5.3978
3.2112	8.0	11256	2.4129	1.0	214.7005	6.6841
2.9436	9.0	12663	2.2953	1.0	216.1568	7.5042
2.8647	10.0	14070	2.2094	1.0	213.5292	8.1059
2.6919	11.0	15477	2.1563	1.0	212.4440	8.6020
2.6117	12.0	16884	2.1075	1.0	214.2269	8.9490
2.5448	13.0	18291	2.0759	1.0	213.5605	9.2504
2.5115	14.0	19698	2.0463	1.0	214.8060	9.4776
2.4257	15.0	21105	2.0095	1.0	214.7406	9.7150
2.3938	16.0	22512	1.9898	1.0	215.9306	9.8699
2.3467	17.0	23919	1.9705	1.0	215.7179	10.0840
2.298	18.0	25326	1.9575	1.0	215.5572	10.2474
2.2712	19.0	26733	1.9382	1.0	213.5135	10.3838
2.2264	20.0	28140	1.9135	1.0	214.5938	10.5263
2.1897	21.0	29547	1.8935	1.0	216.1186	10.6721
2.167	22.0	30954	1.8883	1.0	212.1273	10.7909
2.1503	23.0	32361	1.8746	1.0	217.0957	10.8991
2.0907	24.0	33768	1.8560	1.0	215.7370	11.0596
2.1052	25.0	35175	1.8475	1.0	216.2399	11.1471
2.0652	26.0	36582	1.8431	1.0	214.3941	11.2386
2.0244	27.0	37989	1.8248	1.0	216.7905	11.3224
2.0077	28.0	39396	1.8150	1.0	215.1501	11.4013
2.0417	29.0	40803	1.8087	1.0	214.4889	11.4483
2.008	30.0	42210	1.8023	1.0	213.1100	11.5161
1.9606	31.0	43617	1.7926	1.0	214.1516	11.6453
1.9298	32.0	45024	1.7977	1.0	214.0810	11.6786
1.938	33.0	46431	1.7829	1.0	223.3107	11.7729
1.8941	34.0	47838	1.7701	1.0	223.4652	11.8105
1.9073	35.0	49245	1.7751	1.0	221.6524	11.8768
1.8586	36.0	50652	1.7664	1.0	223.5310	11.9486
1.8678	37.0	52059	1.7554	1.0	217.4213	12.0003
1.8227	38.0	53466	1.7456	1.0	217.4962	12.0672
1.7791	39.0	54873	1.7452	1.0	220.3917	12.0985
1.8189	40.0	56280	1.7439	1.0	217.9907	12.1608
1.8328	41.0	57687	1.7462	1.0	217.8165	12.2203
1.8378	42.0	59094	1.7370	1.0	217.3545	12.3155
1.7768	43.0	60501	1.7386	1.0	219.9349	12.3127
1.752	44.0	61908	1.7253	1.0	219.1841	12.3925
1.7366	45.0	63315	1.7222	1.0	216.8871	12.4186
1.6971	46.0	64722	1.7219	1.0	216.8452	12.4796
1.7429	47.0	66129	1.7136	1.0	219.9515	12.5299
1.6928	48.0	67536	1.7111	1.0	217.4749	12.5543
1.6711	49.0	68943	1.7052	1.0	218.2828	12.5817
1.6935	50.0	70350	1.7078	1.0	216.9740	12.6009

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/c591ac3fb982261b29782cd25fe3c5a2

Base model

google/umt5-small

Finetuned

(45)

this model