Whisper-Tiny Fine-tuned for Urdu ASR

Fine-tuned version of openai/whisper-tiny for Urdu automatic speech recognition.

Model Performance

Metric Score
WER 38.10%
CER 16.00%

Evaluated on the held-out test split of the Unified Urdu Speech ASR dataset.

Usage

Quick Inference (Pipeline API)

from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model  = "abidanoaman/whisper-tiny-urdu-merged-data",
    device = 0,   # 0 = GPU, -1 = CPU
)

result = asr("your_urdu_audio.wav")
print(result["text"])

Manual Inference

import torch
import torchaudio
from transformers import WhisperForConditionalGeneration, WhisperProcessor

model     = WhisperForConditionalGeneration.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")
processor = WhisperProcessor.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")
device    = "cuda" if torch.cuda.is_available() else "cpu"
model     = model.to(device)

# Load and preprocess audio
waveform, sr = torchaudio.load("your_urdu_audio.wav")
if waveform.shape[0] > 1:                          # stereo → mono
    waveform = torch.mean(waveform, dim=0, keepdim=True)
if sr != 16000:
    waveform = torchaudio.transforms.Resample(sr, 16000)(waveform)
waveform = waveform.squeeze().numpy()

# Transcribe
inputs   = processor(waveform, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(device)

with torch.no_grad():
    pred_ids = model.generate(
        input_features,
        language       = "urdu",
        task           = "transcribe",
        max_new_tokens = 225,
    )

transcription = processor.batch_decode(pred_ids, skip_special_tokens=True)[0]
print(transcription)

Resume / Continue Fine-tuning

from transformers import (
    WhisperForConditionalGeneration,
    WhisperProcessor,
    Seq2SeqTrainer,
    Seq2SeqTrainingArguments,
    EarlyStoppingCallback,
)

model     = WhisperForConditionalGeneration.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")
processor = WhisperProcessor.from_pretrained("abidanoaman/whisper-tiny-urdu-merged-data")

# Re-apply generation config
model.generation_config.language        = "urdu"
model.generation_config.task            = "transcribe"
model.generation_config.suppress_tokens = []

# Optional: unfreeze CNN layers if adapting to a new domain
# for param in model.model.encoder.conv1.parameters():
#     param.requires_grad = True
# for param in model.model.encoder.conv2.parameters():
#     param.requires_grad = True

# Then set up your dataset, collator, and Seq2SeqTrainer as before.
# See resume_config.json for the full training configuration.

Training Details

Parameter Value
Base model openai/whisper-tiny
Dataset Unified Urdu Speech ASR
Learning rate 3e-06
Effective batch size 16 (8 × 2 grad accum)
Warmup steps 200
Best checkpoint step 2000
Early stopping patience=3 (eval every 500 steps)
Mixed precision FP16
Frozen layers CNN front-end (conv1 + conv2)

Text Normalization

Applied before training and evaluation:

  • Urdu diacritics (harakat) removed
  • Arabic character variants standardized (e.g., أ إ آ → ا)
  • Non-Urdu characters stripped

Files in this Repository

File Description
config.json Model architecture config
generation_config.json Generation settings (language, task, suppress_tokens)
model.safetensors Fine-tuned model weights
preprocessor_config.json Feature extractor config (mel spectrogram)
tokenizer_config.json Tokenizer settings
vocab.json Whisper multilingual BPE vocabulary
training_info.json Training results and hyperparameters
resume_config.json Full config for resuming/replicating training
Downloads last month
28
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abidanoaman/whisper-tiny-urdu-merged-data

Finetuned
(1702)
this model