Whisper Small – Gujarati ASR

Fine-tuned openai/whisper-small for Gujarati automatic speech recognition (speech-to-text).

Training Details

Parameter Value
Base model openai/whisper-small (244M params)
Dataset shunyalabs/gujarati-speech-dataset (~20GB)
Training method Full fine-tuning (no LoRA)
Learning rate 1e-5 (linear decay)
Warmup steps 500
Max steps 10,000
Effective batch size 32 (16 Γ— 2 grad accum)
Precision FP16
Optimizer AdamW
Gradient checkpointing βœ…

Recipe Sources

  • IIT Madras Whisper Hindi recipe (vasista22/whisper-hindi-large-v2)
  • Paper: Fine-tuning Whisper for Pashto ASR (arxiv: 2604.06507) β€” confirms full FT >> LoRA for low-resource
  • Paper: Enhancing Whisper's Accuracy for Indian Languages (arxiv: 2412.19785)

Usage

from transformers import pipeline

transcriber = pipeline(
    "automatic-speech-recognition",
    model="pathslash/whisper-small-gujarati",
)

result = transcriber("path/to/gujarati_audio.wav")
print(result["text"])

With explicit Gujarati language forcing:

from transformers import WhisperForConditionalGeneration, WhisperProcessor

processor = WhisperProcessor.from_pretrained("pathslash/whisper-small-gujarati")
model = WhisperForConditionalGeneration.from_pretrained("pathslash/whisper-small-gujarati")

# Load your audio (must be 16kHz mono)
import librosa
audio, sr = librosa.load("path/to/audio.wav", sr=16000)

input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)

Training Script

The full training script is available in this repo: train_whisper_gujarati.py

To run training yourself:

pip install transformers datasets evaluate jiwer torch accelerate trackio soundfile librosa
python train_whisper_gujarati.py

Recommended hardware: NVIDIA A10G (24GB) or better.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for pathslash/whisper-small-gujarati

Finetuned
(3497)
this model

Dataset used to train pathslash/whisper-small-gujarati