13f9ad0fd0322d7cd9273f06b65b5233

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [de-es] dataset. It achieves the following results on the evaluation set:

Loss: 2.0161
Data Size: 1.0
Epoch Runtime: 144.7519
Bleu: 5.7980

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	15.6258	0	12.2547	0.0134
No log	1	688	12.4983	0.0078	13.7138	0.0155
No log	2	1376	11.5197	0.0156	14.9386	0.0127
No log	3	2064	10.6213	0.0312	17.9074	0.0106
0.4954	4	2752	9.6613	0.0625	22.3583	0.0081
0.8229	5	3440	6.1175	0.125	30.8167	0.0141
3.8555	6	4128	2.7627	0.25	47.7922	2.7902
3.2049	7	4816	2.4681	0.5	83.9102	3.3027
2.8805	8.0	5504	2.3248	1.0	149.4133	2.9320
2.7253	9.0	6192	2.2407	1.0	145.2460	3.9451
2.5951	10.0	6880	2.1922	1.0	146.1101	4.2159
2.5115	11.0	7568	2.1599	1.0	157.3725	4.4794
2.4214	12.0	8256	2.1309	1.0	155.4184	4.9169
2.3681	13.0	8944	2.1120	1.0	148.8502	4.9829
2.335	14.0	9632	2.0946	1.0	148.5904	5.1887
2.2482	15.0	10320	2.0830	1.0	145.3452	5.2576
2.1906	16.0	11008	2.0633	1.0	145.2194	5.3111
2.1795	17.0	11696	2.0571	1.0	145.1608	5.5390
2.1068	18.0	12384	2.0420	1.0	145.6123	5.5147
2.1016	19.0	13072	2.0314	1.0	144.0503	5.6088
2.0614	20.0	13760	2.0222	1.0	146.8678	5.6972
2.0181	21.0	14448	2.0209	1.0	145.4855	5.7244
1.9832	22.0	15136	2.0253	1.0	145.5249	5.7423
1.9477	23.0	15824	2.0215	1.0	146.1704	5.6548
1.9223	24.0	16512	2.0203	1.0	145.2873	5.9267
1.9124	25.0	17200	2.0112	1.0	145.5622	5.7941
1.8305	26.0	17888	2.0095	1.0	144.1184	5.9749
1.8449	27.0	18576	2.0177	1.0	144.2892	5.8225
1.8013	28.0	19264	2.0191	1.0	145.0509	5.9746
1.7786	29.0	19952	2.0137	1.0	144.8962	5.9460
1.7274	30.0	20640	2.0161	1.0	144.7519	5.7980

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/13f9ad0fd0322d7cd9273f06b65b5233

Base model

google/mt5-base

Finetuned

(301)

this model