c5a6aa1cac98056a363e6433add3f0d1

This model is a fine-tuned version of google/mt5-base on the Helsinki-NLP/opus_books [en-fr] dataset. It achieves the following results on the evaluation set:

Loss: 1.2190
Data Size: 1.0
Epoch Runtime: 670.7207
Bleu: 12.7918

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	16.0589	0	50.9773	0.0100
No log	1	3177	11.1813	0.0078	56.8912	0.0142
0.2118	2	6354	6.6049	0.0156	63.2922	0.0194
4.6504	3	9531	2.6897	0.0312	73.9785	0.9699
3.2417	4	12708	2.1665	0.0625	93.2277	4.1242
2.65	5	15885	1.9351	0.125	130.2395	5.0878
2.3925	6	19062	1.8160	0.25	205.5718	6.3567
2.1286	7	22239	1.6737	0.5	357.6948	7.8982
1.9196	8.0	25416	1.5453	1.0	661.9557	9.3487
1.7459	9.0	28593	1.4706	1.0	659.2342	9.9047
1.685	10.0	31770	1.4182	1.0	664.3399	10.4173
1.5963	11.0	34947	1.3772	1.0	665.4371	10.7262
1.5236	12.0	38124	1.3546	1.0	670.7171	10.9350
1.4636	13.0	41301	1.3315	1.0	669.2230	11.0899
1.4238	14.0	44478	1.3111	1.0	670.5610	11.2603
1.388	15.0	47655	1.2931	1.0	670.4118	11.5175
1.3304	16.0	50832	1.2822	1.0	671.8367	11.5027
1.3156	17.0	54009	1.2675	1.0	674.5763	11.6799
1.2983	18.0	57186	1.2560	1.0	673.3043	11.8304
1.2557	19.0	60363	1.2506	1.0	677.0719	11.9009
1.2131	20.0	63540	1.2390	1.0	677.3352	11.9842
1.1947	21.0	66717	1.2344	1.0	668.4357	12.1337
1.1726	22.0	69894	1.2325	1.0	673.0276	12.1706
1.1496	23.0	73071	1.2251	1.0	669.9871	12.3075
1.1575	24.0	76248	1.2194	1.0	665.4835	12.3460
1.128	25.0	79425	1.2180	1.0	669.1375	12.4199
1.1023	26.0	82602	1.2168	1.0	667.3919	12.4697
1.0738	27.0	85779	1.2137	1.0	663.2292	12.5530
1.0703	28.0	88956	1.2098	1.0	668.9212	12.5429
1.0455	29.0	92133	1.2121	1.0	664.4037	12.6790
1.0289	30.0	95310	1.2152	1.0	674.1455	12.6942
1.0149	31.0	98487	1.2121	1.0	665.4537	12.7257
1.0017	32.0	101664	1.2087	1.0	667.0060	12.7818
0.9835	33.0	104841	1.2099	1.0	666.1137	12.8043
0.9792	34.0	108018	1.2111	1.0	669.2019	12.8400
0.9377	35.0	111195	1.2129	1.0	663.0137	12.7757
0.9088	36.0	114372	1.2190	1.0	670.7207	12.7918

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/c5a6aa1cac98056a363e6433add3f0d1

Base model

google/mt5-base

Finetuned

(301)

this model