Fine Tuned Whisper Model For Pashto

This fine-tuned Whisper Medium model provides high-quality Pashto speech-to-text transcription, optimized for diverse accents and noisy environments.

Model Details

Developed by: AbdulMoizShah01

Shared by: AbdulMoizShah01/pashto-whisper-medium

Model type: Encoder–decoder sequence-to-sequence ASR

Language(s): Pashto (ps)

License: Apache 2.0

Fine-tuned from: openai/whisper-medium

Uses

Direct Use

ASR transcription: Convert Pashto audio files (.wav, .mp3, .flac) sampled ≥16 kHz into text.

Research: Evaluate Pashto speech recognition in academic settings.

Downstream Use

Captioning & subtitles for Pashto media.

Preprocessing step in Pashto NLP pipelines (e.g., speech analytics).

Out-of-Scope Use

Non-Pashto languages or code-switching beyond simple borrowings.

Speech translation—this model does not translate outputs.

Bias, Risks, and Limitations

Trained primarily on Mozilla Common Voice Pashto and supplemental domain recordings; may underperform on rare dialects or highly noisy audio.

Potential bias toward speakers in urban settings.

May mis-transcribe uncommon proper nouns or technical terms.

Recommendations

Validate outputs when using in critical settings (e.g., legal or medical transcription).

Provide clear audio sampling at ≥16 kHz for best accuracy.

How to Get Started with the Model

from transformers import WhisperProcessor, WhisperForConditionalGeneration import torch

Load processor and model

processor = WhisperProcessor.from_pretrained("AbdulMoizShah01/pashto-whisper-medium") model = WhisperForConditionalGeneration.from_pretrained("AbdulMoizShah01/pashto-whisper-medium") model.eval()

Load and preprocess audio

import librosa audio, sr = librosa.load("path/to/audio.wav", sr=16000) inputs = processor(audio, sampling_rate=sr, return_tensors="pt")

Training Details

Training Data

Primary dataset: Mozilla Common Voice Pashto (~200 hours).

Supplementary data: Domain recordings from news broadcasts, podcasts, and conversational samples (~300 hours).

Preprocessing: Noise reduction, normalization, and segmentation into 30 s clips.

Training Procedure

Framework: PyTorch & Hugging Face Transformers

Optimizer: AdamW

Learning rate: 5e-6, linear decay

Batch size: 32

Epochs: 100

Hardware: NVIDIA PAFIAST GPUs

Evaluation

Metric

Value

Word Error Rate (WER)

8.5%

Character Error Rate (CER)

3.1%

In-domain accuracy

91.5%

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbdulMoizShah01/Whisper-Pashto

Finetuned
(814)
this model