Azerbaijani Whisper Turbo
Fine-tuned openai/whisper-large-v3-turbo for Azerbaijani automatic speech recognition.
Performance
| Model | Params | WER | CER |
|---|---|---|---|
| whisper-small (baseline) | 242M | 52.17% | 14.52% |
| whisper-medium (baseline) | 769M | 34.54% | 9.00% |
| whisper-large-v3 (baseline) | 1543M | 21.00% | 5.51% |
| whisper-large-v3-turbo (baseline) | 809M | 22.99% | 6.55% |
| azerbaijani-whisper-small | 242M | 20.54% | 5.72% |
| azerbaijani-whisper-turbo | 809M | 13.17% | 3.45% |
This model achieves 8% lower WER than whisper-large-v3 with nearly 2x faster inference.
Evaluated on FLEURS Azerbaijani test set.
Usage
import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import soundfile as sf
import numpy as np
processor = WhisperProcessor.from_pretrained("LocalDoc/azerbaijani-whisper-turbo")
model = WhisperForConditionalGeneration.from_pretrained("LocalDoc/azerbaijani-whisper-turbo")
audio, sr = sf.read("audio.wav")
if len(audio.shape) > 1:
audio = audio.mean(axis=1)
audio = librosa.resample(np.asarray(audio, dtype=np.float32), orig_sr=sr, target_sr=16000)
sr = 16000
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
forced_ids = processor.get_decoder_prompt_ids(language="az", task="transcribe")
with torch.no_grad():
ids = model.generate(inputs.input_features, forced_decoder_ids=forced_ids)
text = processor.batch_decode(ids, skip_special_tokens=True)[0]
print(text)
Note: Audio must be 16kHz mono. If your audio has a different sample rate, use
librosa.resample()as shown above.
Requirements
pip install transformers torch soundfile librosa
Which model to choose?
| Model | Best for | WER | Speed |
|---|---|---|---|
| azerbaijani-whisper-small | CPU deployment, edge devices, low-resource environments | 20.54% | Fast on CPU |
| azerbaijani-whisper-turbo | GPU deployment, real-time transcription, highest accuracy | 13.17% | Very fast on GPU |
Benchmark Details
All models evaluated on FLEURS Azerbaijani test split (921 samples) with the same normalization (lowercase, no punctuation).
| Model | Params | WER | CER | RTF (GPU) |
|---|---|---|---|---|
| whisper-tiny | 38M | 104.48% | 53.93% | 0.033 |
| whisper-base | 73M | 82.63% | 30.35% | 0.032 |
| whisper-small | 242M | 52.17% | 14.52% | 0.053 |
| whisper-medium | 769M | 34.54% | 9.00% | 0.097 |
| whisper-large-v3 | 1543M | 21.00% | 5.51% | 0.129 |
| whisper-large-v3-turbo | 809M | 22.99% | 6.55% | 0.024 |
| azerbaijani-whisper-small | 242M | 20.54% | 5.72% | ~0.05 |
| azerbaijani-whisper-turbo | 809M | 13.17% | 3.45% | ~0.024 |
License
Apache 2.0
- Downloads last month
- 125
Model tree for LocalDoc/azerbaijani-whisper-turbo
Base model
openai/whisper-large-v3 Finetuned
openai/whisper-large-v3-turboDatasets used to train LocalDoc/azerbaijani-whisper-turbo
Evaluation results
- WER on FLEURS Azerbaijanitest set self-reported13.170
- CER on FLEURS Azerbaijanitest set self-reported3.450