Azerbaijani Whisper Turbo

Fine-tuned openai/whisper-large-v3-turbo for Azerbaijani automatic speech recognition.

Performance

Model Params WER CER
whisper-small (baseline) 242M 52.17% 14.52%
whisper-medium (baseline) 769M 34.54% 9.00%
whisper-large-v3 (baseline) 1543M 21.00% 5.51%
whisper-large-v3-turbo (baseline) 809M 22.99% 6.55%
azerbaijani-whisper-small 242M 20.54% 5.72%
azerbaijani-whisper-turbo 809M 13.17% 3.45%

This model achieves 8% lower WER than whisper-large-v3 with nearly 2x faster inference.

Evaluated on FLEURS Azerbaijani test set.

Usage

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
import numpy as np

processor = WhisperProcessor.from_pretrained("LocalDoc/azerbaijani-whisper-turbo")
model = WhisperForConditionalGeneration.from_pretrained("LocalDoc/azerbaijani-whisper-turbo")

audio, sr = sf.read("audio.wav")

if len(audio.shape) > 1:
    audio = audio.mean(axis=1)

audio = librosa.resample(np.asarray(audio, dtype=np.float32), orig_sr=sr, target_sr=16000)
sr = 16000

inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="az", task="transcribe")

with torch.no_grad():
    ids = model.generate(inputs.input_features, forced_decoder_ids=forced_ids)

text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(text)

Note: Audio must be 16kHz mono. If your audio has a different sample rate, use librosa.resample() as shown above.

Requirements

pip install transformers torch soundfile librosa

Which model to choose?

Model Best for WER Speed
azerbaijani-whisper-small CPU deployment, edge devices, low-resource environments 20.54% Fast on CPU
azerbaijani-whisper-turbo GPU deployment, real-time transcription, highest accuracy 13.17% Very fast on GPU

Benchmark Details

All models evaluated on FLEURS Azerbaijani test split (921 samples) with the same normalization (lowercase, no punctuation).

Model Params WER CER RTF (GPU)
whisper-tiny 38M 104.48% 53.93% 0.033
whisper-base 73M 82.63% 30.35% 0.032
whisper-small 242M 52.17% 14.52% 0.053
whisper-medium 769M 34.54% 9.00% 0.097
whisper-large-v3 1543M 21.00% 5.51% 0.129
whisper-large-v3-turbo 809M 22.99% 6.55% 0.024
azerbaijani-whisper-small 242M 20.54% 5.72% ~0.05
azerbaijani-whisper-turbo 809M 13.17% 3.45% ~0.024

License

Apache 2.0

Downloads last month
125
Safetensors
Model size
0.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LocalDoc/azerbaijani-whisper-turbo

Finetuned
(479)
this model

Datasets used to train LocalDoc/azerbaijani-whisper-turbo

Evaluation results