a317f3b8086eff826b3405960696df53

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [en-es] dataset. It achieves the following results on the evaluation set:

Loss: 1.6287
Data Size: 1.0
Epoch Runtime: 487.6002
Bleu: 9.3762

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	17.5906	0	38.7980	0.0066
No log	1	2336	11.2078	0.0078	42.4555	0.0090
0.2148	2	4672	8.1461	0.0156	46.5754	0.0132
0.2317	3	7008	3.5696	0.0312	54.7314	0.2977
3.4847	4	9344	2.5315	0.0625	69.8693	5.4395
3.1358	5	11680	2.3672	0.125	96.1808	5.3783
2.8317	6	14016	2.2252	0.25	149.3766	5.0234
2.6068	7	16352	2.0973	0.5	271.2784	6.5774
2.3911	8.0	18688	1.9709	1.0	507.0318	7.6639
2.2176	9.0	21024	1.8925	1.0	497.3987	8.1176
2.1229	10.0	23360	1.8367	1.0	483.7592	8.5419
2.0508	11.0	25696	1.7997	1.0	487.4110	8.5801
1.9534	12.0	28032	1.7722	1.0	489.1477	8.8942
1.8972	13.0	30368	1.7458	1.0	490.0701	8.6582
1.8732	14.0	32704	1.7242	1.0	490.0359	8.7500
1.8086	15.0	35040	1.7088	1.0	488.0000	9.0016
1.8172	16.0	37376	1.6885	1.0	501.7522	8.9066
1.7176	17.0	39712	1.6740	1.0	488.9021	9.0096
1.7279	18.0	42048	1.6703	1.0	514.9460	9.1182
1.687	19.0	44384	1.6622	1.0	509.9660	9.1538
1.6463	20.0	46720	1.6530	1.0	509.7470	9.2026
1.6062	21.0	49056	1.6472	1.0	511.0166	9.0921
1.5816	22.0	51392	1.6488	1.0	515.4553	8.9717
1.546	23.0	53728	1.6392	1.0	513.8395	9.1425
1.4913	24.0	56064	1.6401	1.0	509.7644	9.1420
1.5082	25.0	58400	1.6299	1.0	509.5740	9.0798
1.4748	26.0	60736	1.6231	1.0	510.2230	9.2542
1.4534	27.0	63072	1.6280	1.0	504.9012	9.1842
1.4294	28.0	65408	1.6194	1.0	502.2083	9.2138
1.4022	29.0	67744	1.6272	1.0	483.3266	9.3731
1.395	30.0	70080	1.6300	1.0	486.9168	9.4220
1.3651	31.0	72416	1.6317	1.0	485.8095	9.3675
1.3667	32.0	74752	1.6287	1.0	487.6002	9.3762

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/a317f3b8086eff826b3405960696df53

Base model

google/mt5-base

Finetuned

(301)

this model