Whisper Small Belarusian (Common Voice 22 Sidon)
A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.
Benchmark Results
| Model | Parameters | WER (lower is better) |
|---|---|---|
| openai/whisper-small (base) | 244M | 92.21% |
| openai/whisper-large-v3 | 1550M | 63.64% |
| This model | 244M | 20.21% |
Key finding: This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.
Training Details
- Base Model: openai/whisper-small
- Dataset: sarulab-speech/commonvoice22_sidon (Belarusian subset)
- Training Steps: 1700
- Learning Rate: 1e-5
- Batch Size: 8
- Final Loss: ~0.21
- Framework: PyTorch, Transformers, Accelerate
Usage
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
result = pipe("audio_file.mp3")
print(result["text"])
For longer audio files:
result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
print(result["text"])
Description
English
This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.
Русский
Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.
Беларуская
Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.
Limitations
- Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
- Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
- Best results on clear audio with minimal background noise
Citation
If you use this model, please cite:
@misc{whisper-small-be-custom,
author = {aleton},
title = {Whisper Small Belarusian Custom},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/aleton/whisper-small-be-custom}
}
- Downloads last month
- 87
Dataset used to train Aleton/whisper-small-be-custom
Evaluation results
- Word Error Rate on Common Voice 22 Sidon (Belarusian)test set self-reported27.270