Whisper Small Mandarin - Chengyi Li

This model is a fine-tuned version of openai/whisper-small on the Common Voice 24.0 - Mandarin dataset. It achieves the following results on the evaluation set:

  • Best CER: 13.5475

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

Used to train this model which was first fine-tuned on 1,000 hours of Mandarin and then 300 hours Cantonese.

Training procedure

Done through Google Colab Pro using the L4 GPU

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 5000
  • mixed_precision_training: Native AMP

Training results

Step Epoch Training Loss Validation Loss CER
1000 0.5411 0.0951 0.3452 14.5735
2000 1.0823 0.0843 0.3453 14.4748
3000 1.6234 0.0891 0.3467 13.9116
4000 2.1645 0.0499 0.3476 13.7131
5000 2.7056 0.0329 0.3489 13.5475

Framework versions

  • Transformers 4.52.0
  • Pytorch 2.9.0+cu126
  • Datasets 4.4.2
  • Tokenizers 0.21.4
Downloads last month
87
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chengyili2005/whisper-small-mandarin

Finetuned
(3185)
this model