Azerbaijani Whisper Small

Fine-tuned openai/whisper-small for Azerbaijani automatic speech recognition.

Performance

Model	Params	WER	CER
whisper-small (baseline)	242M	52.17%	14.52%
whisper-medium (baseline)	769M	34.54%	9.00%
whisper-large-v3 (baseline)	1543M	21.00%	5.51%
azerbaijani-whisper-small	242M	20.54%	5.72%

This model achieves better quality than whisper-large-v3 while being 6x smaller.

Evaluated on FLEURS Azerbaijani test set.

Usage

pip install --upgrade transformers

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
import numpy as np

processor = WhisperProcessor.from_pretrained("LocalDoc/azerbaijani-whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("LocalDoc/azerbaijani-whisper-small")

audio, sr = sf.read("audio.wav")

if len(audio.shape) > 1:
    audio = audio.mean(axis=1)

audio = librosa.resample(np.asarray(audio, dtype=np.float32), orig_sr=sr, target_sr=16000)
sr = 16000

inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="az", task="transcribe")

with torch.no_grad():
    ids = model.generate(inputs.input_features, forced_decoder_ids=forced_ids)

text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(text)

Note: Audio must be 16kHz mono. If your audio has a different sample rate, use librosa.resample() as shown above. Passing audio without resampling will produce incorrect results.

Requirements

pip install transformers torch soundfile librosa

Benchmark Details

All models evaluated on FLEURS Azerbaijani test split (921 samples) with the same normalization (lowercase, no punctuation).

Model	Params	WER	CER	RTF (GPU)
whisper-tiny	38M	104.48%	53.93%	0.033
whisper-base	73M	82.63%	30.35%	0.032
whisper-small	242M	52.17%	14.52%	0.053
whisper-medium	769M	34.54%	9.00%	0.097
whisper-large-v3	1543M	21.00%	5.51%	0.129
whisper-large-v3-turbo	809M	22.99%	6.55%	0.024
azerbaijani-whisper-small	242M	20.54%	5.72%	~0.05

License

Apache 2.0

Downloads last month: 137

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for LocalDoc/azerbaijani-whisper-small

Base model

openai/whisper-small

Finetuned

(3578)

this model

Finetunes

2 models

Datasets used to train LocalDoc/azerbaijani-whisper-small

Evaluation results

WER on FLEURS Azerbaijani
test set self-reported

20.540
CER on FLEURS Azerbaijani
test set self-reported

5.720