Azerbaijani Whisper Small

Fine-tuned openai/whisper-small for Azerbaijani automatic speech recognition.

Performance

Model Params WER CER
whisper-small (baseline) 242M 52.17% 14.52%
whisper-medium (baseline) 769M 34.54% 9.00%
whisper-large-v3 (baseline) 1543M 21.00% 5.51%
azerbaijani-whisper-small 242M 20.54% 5.72%

This model achieves better quality than whisper-large-v3 while being 6x smaller.

Evaluated on FLEURS Azerbaijani test set.

Usage

pip install --upgrade transformers
import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf

# Load model
processor = WhisperProcessor.from_pretrained("LocalDoc/azerbaijani-whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("LocalDoc/azerbaijani-whisper-small")

# Load audio
audio, sr = sf.read("audio.wav")

# Resample to 16kHz if needed (important!)
if sr != 16000:
    audio = librosa.resample(audio, orig_sr=sr, target_sr=16000)

# Convert stereo to mono if needed
if len(audio.shape) > 1:
    audio = audio.mean(axis=1)

# Transcribe
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="az", task="transcribe")

with torch.no_grad():
    ids = model.generate(inputs.input_features, forced_decoder_ids=forced_ids)

text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(text)

Note: Audio must be 16kHz mono. If your audio has a different sample rate, use librosa.resample() as shown above. Passing audio without resampling will produce incorrect results.

Requirements

pip install transformers torch soundfile librosa

Benchmark Details

All models evaluated on FLEURS Azerbaijani test split (921 samples) with the same normalization (lowercase, no punctuation).

Model Params WER CER RTF (GPU)
whisper-tiny 38M 104.48% 53.93% 0.033
whisper-base 73M 82.63% 30.35% 0.032
whisper-small 242M 52.17% 14.52% 0.053
whisper-medium 769M 34.54% 9.00% 0.097
whisper-large-v3 1543M 21.00% 5.51% 0.129
whisper-large-v3-turbo 809M 22.99% 6.55% 0.024
azerbaijani-whisper-small 242M 20.54% 5.72% ~0.05

License

Apache 2.0

Downloads last month
19
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LocalDoc/azerbaijani-whisper-small

Finetuned
(3357)
this model

Datasets used to train LocalDoc/azerbaijani-whisper-small

Evaluation results