Whisper Small Belarusian (Common Voice 22 Sidon)

A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.

Benchmark Results

Model	Parameters	WER (lower is better)
openai/whisper-small (base)	244M	92.21%
openai/whisper-large-v3	1550M	63.64%
This model	244M	20.21%

Key finding: This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.

Training Details

Base Model: openai/whisper-small
Dataset: sarulab-speech/commonvoice22_sidon (Belarusian subset)
Training Steps: 1700
Learning Rate: 1e-5
Batch Size: 8
Final Loss: ~0.21
Framework: PyTorch, Transformers, Accelerate

Usage

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
result = pipe("audio_file.mp3")
print(result["text"])

For longer audio files:

result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
print(result["text"])

Description

English

This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.

Русский

Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.

Беларуская

Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.

Limitations

Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
Best results on clear audio with minimal background noise

Citation

If you use this model, please cite:

@misc{whisper-small-be-custom,
  author = {aleton},
  title = {Whisper Small Belarusian Custom},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/aleton/whisper-small-be-custom}
}

Downloads last month: 7

Safetensors

Model size

0.2B params

Tensor type

F32

Dataset used to train Aleton/whisper-small-be-custom

Evaluation results

Word Error Rate on Common Voice 22 Sidon (Belarusian)
test set self-reported

27.270