Whisper Small Mandarin -> Cantonese - Chengyi Li

This model is a fine-tuned version of openai/whisper-small on the Common Voice 24.0 - Cantonese dataset. It achieves the following results on the evaluation set:

  • Best CER: 12.7222

Model description

This model was first fine-tuned on 1,000 hours of Mandarin and then 300 hours Cantonese. The final model is a Cantonese ASR model.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Done through Google Colab Pro using the L4 GPU

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Training results

Step Epoch Training Loss Validation Loss CER
1000 2.1552 0.0165 0.4200 13.7624
2000 4.3103 0.0044 0.4335 13.4187
3000 6.4655 0.0035 0.4366 13.2166
4000 8.6207 0.0009 0.4526 13.2092
5000 10.7759 0.0041 0.4623 12.7258
6000 12.9310 0.0002 0.4831 12.8821
7000 15.0862 0.0001 0.4738 12.8085
8000 17.2414 0.0 0.4911 12.7571
9000 19.3966 0.0 0.4940 12.7644
10000 21.5517 0.0 0.4950 12.7222

Framework versions

  • Transformers 4.52.0
  • Pytorch 2.9.0+cu126
  • Datasets 4.4.2
  • Tokenizers 0.21.4
Downloads last month
3
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chengyili2005/whisper-small-mando-canto

Finetuned
(3474)
this model