Whisper Base Urdu ASR Model

This model is a fine-tuned version of openai/whisper-base on the common_voice_17_0 dataset.

Usage

from transformers import pipeline

transcriber = pipeline(
  "automatic-speech-recognition", 
  model="kingabzpro/whisper-base-urdu-full"
)

transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"

transcription = transcriber("audio2.mp3")
print(transcription)

{'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'}

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 200
training_steps: 1500
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Wer
0.7511	0.5085	300	0.7027	47.9462
0.6138	1.0169	600	0.6070	44.5482
0.4602	1.5254	900	0.5756	41.2621
0.3916	2.0339	1200	0.5551	40.0672
0.3003	2.5424	1500	0.5551	41.6169

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.6.0
Tokenizers 0.21.1

Evaluation

Urdu ASR Evaluation on Common Voice 17.0 (Test Split).

Metric	Value	Description
WER	39.124%	Word Error Rate (lower is better)
CER	14.781%	Character Error Rate
BLEU	40.373%	BLEU Score (higher is better)
ChrF	69.624	Character n-gram F-score

👉 Review the testing script: Testing Whisper Base Urdu Full

Summary:
The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly. However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.

Downloads last month: 41

Safetensors

Model size

72.6M params

Tensor type

F32

Model tree for kingabzpro/whisper-base-urdu-full

Base model

openai/whisper-base

Finetuned

(718)

this model

Dataset used to train kingabzpro/whisper-base-urdu-full

Space using kingabzpro/whisper-base-urdu-full 1

Collection including kingabzpro/whisper-base-urdu-full

💬Urdu ASR Models

Collection

Collection of fine-tuned Urdu speech recognition models. • 9 items • Updated Jul 14, 2025 • 3

Evaluation results

WER on Common Voice 17.0 (Urdu)
test set self-reported

39.124
CER on Common Voice 17.0 (Urdu)
test set self-reported

14.781
BLEU on Common Voice 17.0 (Urdu)
test set self-reported

40.373
ChrF on Common Voice 17.0 (Urdu)
test set self-reported

69.624