Whisper-Tiny Fine-tuned for Urdu ASR
Fine-tuned version of openai/whisper-tiny for Urdu automatic speech recognition.
Model Performance
| Metric | Score |
|---|---|
| WER | 38.10% |
| CER | 16.00% |
Evaluated on the held-out test split of the Unified Urdu Speech ASR dataset.
Usage
Quick Inference (Pipeline API)
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model = "abidanoaman/whisper-tiny-urdu-merged-data",
device = 0, # 0 = GPU, -1 = CPU
)
result = asr("your_urdu_audio.wav")
print(result["text"])
Manual Inference
import torch
import torchaudio
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")
processor = WhisperProcessor.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Load and preprocess audio
waveform, sr = torchaudio.load("your_urdu_audio.wav")
if waveform.shape[0] > 1: # stereo → mono
waveform = torch.mean(waveform, dim=0, keepdim=True)
if sr != 16000:
waveform = torchaudio.transforms.Resample(sr, 16000)(waveform)
waveform = waveform.squeeze().numpy()
# Transcribe
inputs = processor(waveform, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(device)
with torch.no_grad():
pred_ids = model.generate(
input_features,
language = "urdu",
task = "transcribe",
max_new_tokens = 225,
)
transcription = processor.batch_decode(pred_ids, skip_special_tokens=True)[0]
print(transcription)
Resume / Continue Fine-tuning
from transformers import (
WhisperForConditionalGeneration,
WhisperProcessor,
Seq2SeqTrainer,
Seq2SeqTrainingArguments,
EarlyStoppingCallback,
)
model = WhisperForConditionalGeneration.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")
processor = WhisperProcessor.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")
# Re-apply generation config
model.generation_config.language = "urdu"
model.generation_config.task = "transcribe"
model.generation_config.suppress_tokens = []
# Optional: unfreeze CNN layers if adapting to a new domain
# for param in model.model.encoder.conv1.parameters():
# param.requires_grad = True
# for param in model.model.encoder.conv2.parameters():
# param.requires_grad = True
# Then set up your dataset, collator, and Seq2SeqTrainer as before.
# See resume_config.json for the full training configuration.
Training Details
| Parameter | Value |
|---|---|
| Base model | openai/whisper-tiny |
| Dataset | Unified Urdu Speech ASR |
| Learning rate | 3e-06 |
| Effective batch size | 16 (8 × 2 grad accum) |
| Warmup steps | 200 |
| Best checkpoint step | 2000 |
| Early stopping | patience=3 (eval every 500 steps) |
| Mixed precision | FP16 |
| Frozen layers | CNN front-end (conv1 + conv2) |
Text Normalization
Applied before training and evaluation:
- Urdu diacritics (harakat) removed
- Arabic character variants standardized (e.g., أ إ آ → ا)
- Non-Urdu characters stripped
Files in this Repository
| File | Description |
|---|---|
config.json |
Model architecture config |
generation_config.json |
Generation settings (language, task, suppress_tokens) |
model.safetensors |
Fine-tuned model weights |
preprocessor_config.json |
Feature extractor config (mel spectrogram) |
tokenizer_config.json |
Tokenizer settings |
vocab.json |
Whisper multilingual BPE vocabulary |
training_info.json |
Training results and hyperparameters |
resume_config.json |
Full config for resuming/replicating training |
- Downloads last month
- 28
Model tree for abidanoaman/whisper-tiny-urdu-merged-data
Base model
openai/whisper-tiny