Whisper Small β Gujarati ASR
Fine-tuned openai/whisper-small for Gujarati automatic speech recognition (speech-to-text).
Training Details
| Parameter | Value |
|---|---|
| Base model | openai/whisper-small (244M params) |
| Dataset | shunyalabs/gujarati-speech-dataset (~20GB) |
| Training method | Full fine-tuning (no LoRA) |
| Learning rate | 1e-5 (linear decay) |
| Warmup steps | 500 |
| Max steps | 10,000 |
| Effective batch size | 32 (16 Γ 2 grad accum) |
| Precision | FP16 |
| Optimizer | AdamW |
| Gradient checkpointing | β |
Recipe Sources
- IIT Madras Whisper Hindi recipe (vasista22/whisper-hindi-large-v2)
- Paper: Fine-tuning Whisper for Pashto ASR (arxiv: 2604.06507) β confirms full FT >> LoRA for low-resource
- Paper: Enhancing Whisper's Accuracy for Indian Languages (arxiv: 2412.19785)
Usage
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="pathslash/whisper-small-gujarati",
)
result = transcriber("path/to/gujarati_audio.wav")
print(result["text"])
With explicit Gujarati language forcing:
from transformers import WhisperForConditionalGeneration, WhisperProcessor
processor = WhisperProcessor.from_pretrained("pathslash/whisper-small-gujarati")
model = WhisperForConditionalGeneration.from_pretrained("pathslash/whisper-small-gujarati")
# Load your audio (must be 16kHz mono)
import librosa
audio, sr = librosa.load("path/to/audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Training Script
The full training script is available in this repo: train_whisper_gujarati.py
To run training yourself:
pip install transformers datasets evaluate jiwer torch accelerate trackio soundfile librosa
python train_whisper_gujarati.py
Recommended hardware: NVIDIA A10G (24GB) or better.
Model tree for pathslash/whisper-small-gujarati
Base model
openai/whisper-small