8d80c4225137775af0858becea32124c

This model is a fine-tuned version of google/mt5-small on the Helsinki-NLP/opus_books [fr-pl] dataset. It achieves the following results on the evaluation set:

Loss: 3.1074
Data Size: 1.0
Epoch Runtime: 12.3354
Bleu: 0.9452

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	30.1871	0	1.6414	0.0063
No log	1	70	29.9596	0.0078	2.6833	0.0049
No log	2	140	29.1372	0.0156	2.1655	0.0065
No log	3	210	26.7750	0.0312	2.5830	0.0081
No log	4	280	24.2698	0.0625	2.8853	0.0073
No log	5	350	22.3393	0.125	3.7455	0.0077
No log	6	420	16.7950	0.25	4.8519	0.0108
3.2519	7	490	12.6226	0.5	7.1730	0.0119
13.1623	8.0	560	8.4490	1.0	12.1719	0.0202
9.943	9.0	630	6.1646	1.0	12.3237	0.0197
6.4624	10.0	700	4.4943	1.0	11.2531	0.0547
5.5913	11.0	770	3.9076	1.0	12.0875	0.1409
5.1688	12.0	840	3.7176	1.0	11.7308	0.1435
4.7396	13.0	910	3.6146	1.0	12.0834	0.1751
4.5913	14.0	980	3.5379	1.0	11.8320	0.3039
4.4126	15.0	1050	3.4947	1.0	11.8071	0.3769
4.3419	16.0	1120	3.4415	1.0	11.8549	0.4286
4.2584	17.0	1190	3.4153	1.0	11.9149	0.4475
4.1421	18.0	1260	3.3831	1.0	12.4453	0.4651
4.1184	19.0	1330	3.3586	1.0	11.5380	0.5262
4.0221	20.0	1400	3.3399	1.0	11.8791	0.5627
3.9764	21.0	1470	3.3146	1.0	13.1879	0.5702
3.9722	22.0	1540	3.2963	1.0	12.2334	0.6373
3.8672	23.0	1610	3.2737	1.0	12.3612	0.6230
3.8153	24.0	1680	3.2606	1.0	12.2318	0.6282
3.8111	25.0	1750	3.2451	1.0	12.7453	0.6573
3.7494	26.0	1820	3.2336	1.0	13.0331	0.6433
3.7313	27.0	1890	3.2227	1.0	11.5825	0.7294
3.6749	28.0	1960	3.2139	1.0	11.4649	0.7157
3.665	29.0	2030	3.2049	1.0	11.5801	0.6979
3.6267	30.0	2100	3.1945	1.0	11.7894	0.7269
3.5956	31.0	2170	3.1884	1.0	12.8793	0.7162
3.5529	32.0	2240	3.1804	1.0	12.3465	0.7233
3.5436	33.0	2310	3.1742	1.0	12.6375	0.7587
3.5025	34.0	2380	3.1637	1.0	12.9137	0.8146
3.469	35.0	2450	3.1614	1.0	13.4902	0.8093
3.4704	36.0	2520	3.1543	1.0	11.5728	0.7949
3.4422	37.0	2590	3.1511	1.0	12.1080	0.8058
3.4156	38.0	2660	3.1463	1.0	11.9007	0.8258
3.3779	39.0	2730	3.1411	1.0	12.0938	0.8863
3.3928	40.0	2800	3.1357	1.0	12.2520	0.8939
3.3517	41.0	2870	3.1311	1.0	12.6665	0.8632
3.3405	42.0	2940	3.1254	1.0	13.1554	0.9187
3.3058	43.0	3010	3.1275	1.0	12.7304	0.8878
3.2953	44.0	3080	3.1200	1.0	14.0716	0.9595
3.281	45.0	3150	3.1176	1.0	11.2074	0.9117
3.2565	46.0	3220	3.1164	1.0	11.6439	0.9633
3.2368	47.0	3290	3.1132	1.0	11.6638	0.9907
3.2169	48.0	3360	3.1145	1.0	12.0850	0.9506
3.1816	49.0	3430	3.1111	1.0	12.1911	0.9197
3.1793	50.0	3500	3.1074	1.0	12.3354	0.9452

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 3

Safetensors

Model size

0.6B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/8d80c4225137775af0858becea32124c

Base model

google/mt5-small

Finetuned

(600)

this model

contemmcm
/

8d80c4225137775af0858becea32124c

8d80c4225137775af0858becea32124c

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/8d80c4225137775af0858becea32124c

Evaluation results