mt5-small-si-spelling-correction
This model is a fine-tuned version of google/mt5-small on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.4102
- Bleu: 69.7567
- Exact Match: 0.5464
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- num_epochs: 10
Training results
| Training Loss | Epoch | Step | Validation Loss | Bleu | Exact Match |
|---|---|---|---|---|---|
| 4.7021 | 0.1885 | 200 | 1.1660 | 40.8853 | 0.223 |
| 2.2886 | 0.3770 | 400 | 0.8649 | 50.1692 | 0.3131 |
| 1.9187 | 0.5655 | 600 | 0.7023 | 55.4593 | 0.3666 |
| 1.7675 | 0.7540 | 800 | 0.6302 | 56.6642 | 0.386 |
| 1.6678 | 0.9425 | 1000 | 0.5876 | 58.7598 | 0.4125 |
| 1.4055 | 1.1310 | 1200 | 0.5461 | 60.4893 | 0.4343 |
| 1.3341 | 1.3195 | 1400 | 0.5383 | 61.4201 | 0.4433 |
| 1.3279 | 1.5080 | 1600 | 0.5249 | 62.0838 | 0.4478 |
| 1.3316 | 1.6965 | 1800 | 0.5078 | 63.1084 | 0.4589 |
| 1.2615 | 1.8850 | 2000 | 0.4807 | 63.2836 | 0.4663 |
| 1.1390 | 2.0735 | 2200 | 0.4873 | 63.5105 | 0.4685 |
| 1.0408 | 2.2620 | 2400 | 0.4680 | 64.0137 | 0.4732 |
| 1.0046 | 2.4505 | 2600 | 0.4737 | 64.7557 | 0.4817 |
| 1.0253 | 2.6390 | 2800 | 0.4528 | 65.0559 | 0.4846 |
| 1.0368 | 2.8275 | 3000 | 0.4395 | 66.2468 | 0.5016 |
| 0.9353 | 3.0160 | 3200 | 0.4449 | 65.7794 | 0.4934 |
| 0.8893 | 3.2045 | 3400 | 0.4333 | 66.4373 | 0.5045 |
| 0.8773 | 3.3930 | 3600 | 0.4303 | 66.2426 | 0.5034 |
| 0.8967 | 3.5815 | 3800 | 0.4264 | 66.605 | 0.5056 |
| 0.8343 | 3.7700 | 4000 | 0.4217 | 67.1312 | 0.5117 |
| 0.8601 | 3.9585 | 4200 | 0.4134 | 67.2582 | 0.5138 |
| 0.7396 | 4.1470 | 4400 | 0.4248 | 67.6638 | 0.5175 |
| 0.7319 | 4.3355 | 4600 | 0.4190 | 67.1581 | 0.5159 |
| 0.7838 | 4.5240 | 4800 | 0.4123 | 67.1691 | 0.5148 |
| 0.7346 | 4.7125 | 5000 | 0.4208 | 67.6472 | 0.5244 |
| 0.7456 | 4.9010 | 5200 | 0.4142 | 68.2617 | 0.5297 |
| 0.6373 | 5.0895 | 5400 | 0.4120 | 68.3017 | 0.5302 |
| 0.6556 | 5.2780 | 5600 | 0.4168 | 67.9493 | 0.5231 |
| 0.6212 | 5.4665 | 5800 | 0.4146 | 68.2046 | 0.5286 |
| 0.6766 | 5.6550 | 6000 | 0.4081 | 68.2101 | 0.5294 |
| 0.6619 | 5.8435 | 6200 | 0.3963 | 68.5369 | 0.5339 |
| 0.6167 | 6.0320 | 6400 | 0.4075 | 68.7244 | 0.5369 |
| 0.5865 | 6.2205 | 6600 | 0.4106 | 69.073 | 0.5369 |
| 0.5998 | 6.4090 | 6800 | 0.4053 | 68.8086 | 0.5358 |
| 0.6190 | 6.5975 | 7000 | 0.4047 | 68.9091 | 0.5363 |
| 0.5882 | 6.7861 | 7200 | 0.4065 | 69.2042 | 0.5395 |
| 0.5885 | 6.9746 | 7400 | 0.4038 | 69.3257 | 0.5422 |
| 0.5300 | 7.1631 | 7600 | 0.4099 | 69.2817 | 0.5416 |
| 0.5073 | 7.3516 | 7800 | 0.4143 | 69.3098 | 0.5422 |
| 0.5250 | 7.5401 | 8000 | 0.4084 | 69.4062 | 0.5432 |
| 0.5496 | 7.7286 | 8200 | 0.4042 | 69.3472 | 0.5416 |
| 0.5292 | 7.9171 | 8400 | 0.4038 | 69.163 | 0.5416 |
| 0.5004 | 8.1056 | 8600 | 0.4077 | 69.5203 | 0.5448 |
| 0.5138 | 8.2941 | 8800 | 0.4060 | 69.5977 | 0.5467 |
| 0.4768 | 8.4826 | 9000 | 0.4046 | 69.7429 | 0.5461 |
| 0.5048 | 8.6711 | 9200 | 0.4042 | 69.7236 | 0.5459 |
| 0.4695 | 8.8596 | 9400 | 0.4077 | 69.6651 | 0.5459 |
| 0.4783 | 9.0481 | 9600 | 0.4049 | 69.5358 | 0.5448 |
| 0.4844 | 9.2366 | 9800 | 0.4079 | 69.6477 | 0.5467 |
| 0.4485 | 9.4251 | 10000 | 0.4089 | 69.7673 | 0.5477 |
| 0.4427 | 9.6136 | 10200 | 0.4128 | 69.679 | 0.5472 |
| 0.4721 | 9.8021 | 10400 | 0.4120 | 69.7694 | 0.5469 |
| 0.4649 | 9.9906 | 10600 | 0.4104 | 69.8335 | 0.5467 |
| 0.4649 | 10.0 | 10610 | 0.4102 | 69.7567 | 0.5464 |
Framework versions
- Transformers 5.5.0
- Pytorch 2.8.0+cu128
- Datasets 4.8.4
- Tokenizers 0.22.2
- Downloads last month
- 817
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for SPEAK-PP/mt5-small-si-spelling-correction
Base model
google/mt5-small