Whisper Small - Hindi Fine-Tuned
This model is a fine-tuned version of openai/whisper-small for Hindi Speech-to-Text (ASR) applications. It was fine-tuned on the hi_in configuration of the Google FLEURS dataset.
The fine-tuning process significantly improved the transcription accuracy, reducing the Word Error Rate (WER) from 68.61% (baseline) down to 26.87%, and the Character Error Rate (CER) from 34.43% down to 10.41%.
Model Details
- Base Model:
openai/whisper-small(244M parameters) - Language: Hindi (
hi) - Task: Automatic Speech Recognition (ASR)
- Dataset: Google FLEURS (Hindi - India)
- License: MIT
Evaluation Results
The model was evaluated on the strictly held-out test split (418 samples) of the FLEURS Hindi dataset.
| Metric | Whisper Small (Base) | Whisper Small (Fine-Tuned) | Relative Improvement |
|---|---|---|---|
| Word Error Rate (WER) | 68.61% | 26.87% | ↓ 60.8% |
| Character Error Rate (CER) | 34.43% | 10.41% | ↓ 69.8% |
These results demonstrate a highly successful adaptation to Hindi phonetics and Devanagari script, with a massive reduction in transcription errors.
Usage
You can use this model directly through the Hugging Face transformers pipeline for speech recognition:
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa
# Load the fine-tuned model and processor
model_id = "rishii100/whisper-small-hindi"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
# Load your Hindi audio file (resample to 16kHz)
audio_path = "path/to/your/hindi_audio.wav"
audio, sr = librosa.load(audio_path, sr=16000)
# Process and generate transcription
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
with torch.no_grad():
predicted_ids = model.generate(input_features, max_length=225)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print("Transcription:", transcription)
Training Details
The model was trained using the following hyperparameters:
- Learning Rate:
1e-05 - Train Batch Size:
16 - Eval Batch Size:
8 - Training Steps:
2000 - Warmup Steps:
250 - Optimizer: AdamW
- Mixed Precision: FP16
- Gradient Accumulation: 1
Intended Use & Limitations
This model is ideal for transcribing general Hindi speech. However, like all speech models, it may experience performance degradation in the following scenarios:
- High background noise or overlapping speakers.
- Heavy regional dialects not represented in the standard FLEURS corpus.
- Extensive code-switching between Hindi and English (Hinglish) where English words are pronounced with thick accents.
- Downloads last month
- 48
Model tree for rishii100/whisper-small-hindi
Base model
openai/whisper-smallDataset used to train rishii100/whisper-small-hindi
Evaluation results
- Word Error Rate (WER) on Google FLEURS (hi_in)test set self-reported26.870
- Character Error Rate (CER) on Google FLEURS (hi_in)test set self-reported10.410