mt5-small-si-spelling-correction

This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4102
Bleu: 69.7567
Exact Match: 0.5464

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Exact Match
4.7021	0.1885	200	1.1660	40.8853	0.223
2.2886	0.3770	400	0.8649	50.1692	0.3131
1.9187	0.5655	600	0.7023	55.4593	0.3666
1.7675	0.7540	800	0.6302	56.6642	0.386
1.6678	0.9425	1000	0.5876	58.7598	0.4125
1.4055	1.1310	1200	0.5461	60.4893	0.4343
1.3341	1.3195	1400	0.5383	61.4201	0.4433
1.3279	1.5080	1600	0.5249	62.0838	0.4478
1.3316	1.6965	1800	0.5078	63.1084	0.4589
1.2615	1.8850	2000	0.4807	63.2836	0.4663
1.1390	2.0735	2200	0.4873	63.5105	0.4685
1.0408	2.2620	2400	0.4680	64.0137	0.4732
1.0046	2.4505	2600	0.4737	64.7557	0.4817
1.0253	2.6390	2800	0.4528	65.0559	0.4846
1.0368	2.8275	3000	0.4395	66.2468	0.5016
0.9353	3.0160	3200	0.4449	65.7794	0.4934
0.8893	3.2045	3400	0.4333	66.4373	0.5045
0.8773	3.3930	3600	0.4303	66.2426	0.5034
0.8967	3.5815	3800	0.4264	66.605	0.5056
0.8343	3.7700	4000	0.4217	67.1312	0.5117
0.8601	3.9585	4200	0.4134	67.2582	0.5138
0.7396	4.1470	4400	0.4248	67.6638	0.5175
0.7319	4.3355	4600	0.4190	67.1581	0.5159
0.7838	4.5240	4800	0.4123	67.1691	0.5148
0.7346	4.7125	5000	0.4208	67.6472	0.5244
0.7456	4.9010	5200	0.4142	68.2617	0.5297
0.6373	5.0895	5400	0.4120	68.3017	0.5302
0.6556	5.2780	5600	0.4168	67.9493	0.5231
0.6212	5.4665	5800	0.4146	68.2046	0.5286
0.6766	5.6550	6000	0.4081	68.2101	0.5294
0.6619	5.8435	6200	0.3963	68.5369	0.5339
0.6167	6.0320	6400	0.4075	68.7244	0.5369
0.5865	6.2205	6600	0.4106	69.073	0.5369
0.5998	6.4090	6800	0.4053	68.8086	0.5358
0.6190	6.5975	7000	0.4047	68.9091	0.5363
0.5882	6.7861	7200	0.4065	69.2042	0.5395
0.5885	6.9746	7400	0.4038	69.3257	0.5422
0.5300	7.1631	7600	0.4099	69.2817	0.5416
0.5073	7.3516	7800	0.4143	69.3098	0.5422
0.5250	7.5401	8000	0.4084	69.4062	0.5432
0.5496	7.7286	8200	0.4042	69.3472	0.5416
0.5292	7.9171	8400	0.4038	69.163	0.5416
0.5004	8.1056	8600	0.4077	69.5203	0.5448
0.5138	8.2941	8800	0.4060	69.5977	0.5467
0.4768	8.4826	9000	0.4046	69.7429	0.5461
0.5048	8.6711	9200	0.4042	69.7236	0.5459
0.4695	8.8596	9400	0.4077	69.6651	0.5459
0.4783	9.0481	9600	0.4049	69.5358	0.5448
0.4844	9.2366	9800	0.4079	69.6477	0.5467
0.4485	9.4251	10000	0.4089	69.7673	0.5477
0.4427	9.6136	10200	0.4128	69.679	0.5472
0.4721	9.8021	10400	0.4120	69.7694	0.5469
0.4649	9.9906	10600	0.4104	69.8335	0.5467
0.4649	10.0	10610	0.4102	69.7567	0.5464

Framework versions

Transformers 5.5.0
Pytorch 2.8.0+cu128
Datasets 4.8.4
Tokenizers 0.22.2

Downloads last month: 5

Safetensors

Model size

0.3B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SPEAK-PP/mt5-small-si-spelling-correction

Base model

google/mt5-small

Finetuned

(681)

this model