Whisper Small - Hindi Fine-Tuned

This model is a fine-tuned version of openai/whisper-small for Hindi Speech-to-Text (ASR) applications. It was fine-tuned on the hi_in configuration of the Google FLEURS dataset.

The fine-tuning process significantly improved the transcription accuracy, reducing the Word Error Rate (WER) from 68.61% (baseline) down to 26.87%, and the Character Error Rate (CER) from 34.43% down to 10.41%.

Model Details

Base Model: openai/whisper-small (244M parameters)
Language: Hindi (hi)
Task: Automatic Speech Recognition (ASR)
Dataset: Google FLEURS (Hindi - India)
License: MIT

Evaluation Results

The model was evaluated on the strictly held-out test split (418 samples) of the FLEURS Hindi dataset.

Metric	Whisper Small (Base)	Whisper Small (Fine-Tuned)	Relative Improvement
Word Error Rate (WER)	68.61%	26.87%	↓ 60.8%
Character Error Rate (CER)	34.43%	10.41%	↓ 69.8%

These results demonstrate a highly successful adaptation to Hindi phonetics and Devanagari script, with a massive reduction in transcription errors.

Usage

You can use this model directly through the Hugging Face transformers pipeline for speech recognition:

import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor
import librosa

# Load the fine-tuned model and processor
model_id = "rishii100/whisper-small-hindi"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load your Hindi audio file (resample to 16kHz)
audio_path = "path/to/your/hindi_audio.wav"
audio, sr = librosa.load(audio_path, sr=16000)

# Process and generate transcription
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    predicted_ids = model.generate(input_features, max_length=225)

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print("Transcription:", transcription)

Training Details

The model was trained using the following hyperparameters:

Learning Rate: 1e-05
Train Batch Size: 16
Eval Batch Size: 8
Training Steps: 2000
Warmup Steps: 250
Optimizer: AdamW
Mixed Precision: FP16
Gradient Accumulation: 1

Intended Use & Limitations

This model is ideal for transcribing general Hindi speech. However, like all speech models, it may experience performance degradation in the following scenarios:

High background noise or overlapping speakers.
Heavy regional dialects not represented in the standard FLEURS corpus.
Extensive code-switching between Hindi and English (Hinglish) where English words are pronounced with thick accents.

Downloads last month: 48

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for rishii100/whisper-small-hindi

Base model

openai/whisper-small

Finetuned

(3496)

this model

Dataset used to train rishii100/whisper-small-hindi

Evaluation results

Word Error Rate (WER) on Google FLEURS (hi_in)
test set self-reported

26.870
Character Error Rate (CER) on Google FLEURS (hi_in)
test set self-reported

10.410