Azerbaijani Whisper Turbo

Fine-tuned openai/whisper-large-v3-turbo for Azerbaijani automatic speech recognition.

Performance

Model	Params	WER	CER
whisper-small (baseline)	242M	52.17%	14.52%
whisper-medium (baseline)	769M	34.54%	9.00%
whisper-large-v3 (baseline)	1543M	21.00%	5.51%
whisper-large-v3-turbo (baseline)	809M	22.99%	6.55%
azerbaijani-whisper-small	242M	20.54%	5.72%
azerbaijani-whisper-turbo	809M	13.17%	3.45%

This model achieves 8% lower WER than whisper-large-v3 with nearly 2x faster inference.

Evaluated on FLEURS Azerbaijani test set.

Usage

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
import numpy as np

processor = WhisperProcessor.from_pretrained("LocalDoc/azerbaijani-whisper-turbo")
model = WhisperForConditionalGeneration.from_pretrained("LocalDoc/azerbaijani-whisper-turbo")

audio, sr = sf.read("audio.wav")

if len(audio.shape) > 1:
    audio = audio.mean(axis=1)

audio = librosa.resample(np.asarray(audio, dtype=np.float32), orig_sr=sr, target_sr=16000)
sr = 16000

inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="az", task="transcribe")

with torch.no_grad():
    ids = model.generate(inputs.input_features, forced_decoder_ids=forced_ids)

text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(text)

Note: Audio must be 16kHz mono. If your audio has a different sample rate, use librosa.resample() as shown above.

Requirements

pip install transformers torch soundfile librosa

Which model to choose?

Model	Best for	WER	Speed
azerbaijani-whisper-small	CPU deployment, edge devices, low-resource environments	20.54%	Fast on CPU
azerbaijani-whisper-turbo	GPU deployment, real-time transcription, highest accuracy	13.17%	Very fast on GPU

Benchmark Details

All models evaluated on FLEURS Azerbaijani test split (921 samples) with the same normalization (lowercase, no punctuation).

Model	Params	WER	CER	RTF (GPU)
whisper-tiny	38M	104.48%	53.93%	0.033
whisper-base	73M	82.63%	30.35%	0.032
whisper-small	242M	52.17%	14.52%	0.053
whisper-medium	769M	34.54%	9.00%	0.097
whisper-large-v3	1543M	21.00%	5.51%	0.129
whisper-large-v3-turbo	809M	22.99%	6.55%	0.024
azerbaijani-whisper-small	242M	20.54%	5.72%	~0.05
azerbaijani-whisper-turbo	809M	13.17%	3.45%	~0.024

License

Apache 2.0

Downloads last month: 275

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for LocalDoc/azerbaijani-whisper-turbo

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(534)

this model

Finetunes

1 model

Datasets used to train LocalDoc/azerbaijani-whisper-turbo

Evaluation results

WER on FLEURS Azerbaijani
test set self-reported

13.170
CER on FLEURS Azerbaijani
test set self-reported

3.450