Whisper Small Mandarin -> Cantonese - Chengyi Li

This model is a fine-tuned version of openai/whisper-small on the Common Voice 24.0 - Cantonese dataset. It achieves the following results on the evaluation set:

Best CER: 12.7222

Model description

This model was first fine-tuned on 1,000 hours of Mandarin and then 300 hours Cantonese. The final model is a Cantonese ASR model.

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Done through Google Colab Pro using the L4 GPU

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 10000
mixed_precision_training: Native AMP

Training results

Step	Epoch	Training Loss	Validation Loss	CER
1000	2.1552	0.0165	0.4200	13.7624
2000	4.3103	0.0044	0.4335	13.4187
3000	6.4655	0.0035	0.4366	13.2166
4000	8.6207	0.0009	0.4526	13.2092
5000	10.7759	0.0041	0.4623	12.7258
6000	12.9310	0.0002	0.4831	12.8821
7000	15.0862	0.0001	0.4738	12.8085
8000	17.2414	0.0	0.4911	12.7571
9000	19.3966	0.0	0.4940	12.7644
10000	21.5517	0.0	0.4950	12.7222

Framework versions

Transformers 4.52.0
Pytorch 2.9.0+cu126
Datasets 4.4.2
Tokenizers 0.21.4

Downloads last month: 3

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for chengyili2005/whisper-small-mando-canto

Base model

openai/whisper-small

Finetuned

(3474)

this model