Instructions to use kingabzpro/whisper-base-urdu-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kingabzpro/whisper-base-urdu-full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="kingabzpro/whisper-base-urdu-full")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("kingabzpro/whisper-base-urdu-full") model = AutoModelForSpeechSeq2Seq.from_pretrained("kingabzpro/whisper-base-urdu-full") - Notebooks
- Google Colab
- Kaggle
Whisper Base Urdu ASR Model
This model is a fine-tuned version of openai/whisper-base on the common_voice_17_0 dataset.
Usage
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="kingabzpro/whisper-base-urdu-full"
)
transcriber.model.generation_config.forced_decoder_ids = None
transcriber.model.generation_config.language = "ur"
transcription = transcriber("audio2.mp3")
print(transcription)
{'text': 'دیکھیے پانی کپ تک بہتا اور مچھلی کپ تک تیرتی ہے'}
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 200
- training_steps: 1500
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss | Wer |
|---|---|---|---|---|
| 0.7511 | 0.5085 | 300 | 0.7027 | 47.9462 |
| 0.6138 | 1.0169 | 600 | 0.6070 | 44.5482 |
| 0.4602 | 1.5254 | 900 | 0.5756 | 41.2621 |
| 0.3916 | 2.0339 | 1200 | 0.5551 | 40.0672 |
| 0.3003 | 2.5424 | 1500 | 0.5551 | 41.6169 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1
Evaluation
Urdu ASR Evaluation on Common Voice 17.0 (Test Split).
| Metric | Value | Description |
|---|---|---|
| WER | 39.124% | Word Error Rate (lower is better) |
| CER | 14.781% | Character Error Rate |
| BLEU | 40.373% | BLEU Score (higher is better) |
| ChrF | 69.624 | Character n-gram F-score |
👉 Review the testing script: Testing Whisper Base Urdu Full
Summary:
The high Word Error Rate (WER) of 39.12% is a significant weakness, indicating that nearly two out of every five words are transcribed incorrectly.
However, the model is much more effective at the character level. The moderate Character Error Rate (CER) of 14.78% and the strong ChrF score of 69.62 show that the system is good at predicting the correct sequence of characters, even if it struggles to form the complete, correct words.
- Downloads last month
- 110
Model tree for kingabzpro/whisper-base-urdu-full
Base model
openai/whisper-baseDataset used to train kingabzpro/whisper-base-urdu-full
Space using kingabzpro/whisper-base-urdu-full 1
Collection including kingabzpro/whisper-base-urdu-full
Evaluation results
- WER on Common Voice 17.0 (Urdu)test set self-reported39.124
- CER on Common Voice 17.0 (Urdu)test set self-reported14.781
- BLEU on Common Voice 17.0 (Urdu)test set self-reported40.373
- ChrF on Common Voice 17.0 (Urdu)test set self-reported69.624