Whisper Small Mandarin - Chengyi Li

This model is a fine-tuned version of openai/whisper-small on the Common Voice 24.0 - Mandarin dataset. It achieves the following results on the evaluation set:

Best CER: 13.5475

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Used to train this model which was first fine-tuned on 1,000 hours of Mandarin and then 300 hours Cantonese.

Training procedure

Done through Google Colab Pro using the L4 GPU

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 5000
mixed_precision_training: Native AMP

Training results

Step	Epoch	Training Loss	Validation Loss	CER
1000	0.5411	0.0951	0.3452	14.5735
2000	1.0823	0.0843	0.3453	14.4748
3000	1.6234	0.0891	0.3467	13.9116
4000	2.1645	0.0499	0.3476	13.7131
5000	2.7056	0.0329	0.3489	13.5475

Framework versions

Transformers 4.52.0
Pytorch 2.9.0+cu126
Datasets 4.4.2
Tokenizers 0.21.4

Downloads last month: 87

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for chengyili2005/whisper-small-mandarin

Base model

openai/whisper-small

Finetuned

(3185)

this model