Urdu ASR - Fine-tuned XLSR-53

Fine-tuned facebook/wav2vec2-large-xlsr-53 on Urdu speech data for automatic speech recognition.

Model Details

  • Base model: facebook/wav2vec2-large-xlsr-53
  • Language: Urdu (ur)
  • Task: Automatic Speech Recognition (ASR)
  • Training data: Unified Urdu Speech ASR Dataset

Usage

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
import torchaudio, torch

processor = Wav2Vec2Processor.from_pretrained("abidanoaman/urdu-asr-wave2vec2-base-merged-optimized")
model = Wav2Vec2ForCTC.from_pretrained("abidanoaman/urdu-asr-wave2vec2-base-merged-optimized")

# Load audio (must be 16kHz mono)
waveform, sr = torchaudio.load("your_audio.wav")
inputs = processor(waveform.squeeze(), sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.decode(predicted_ids[0])
print(transcription)

Training Configuration

  • Epochs: 30
  • Batch size: 2 (grad accum: 4)
  • Learning rate: 0.0001
  • Mixed precision: True
Downloads last month
28
Safetensors
Model size
94.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support