Whisper Small Belarusian (Common Voice 22 Sidon)

A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.


Benchmark Results

Model Parameters WER (lower is better)
openai/whisper-small (base) 244M 92.21%
openai/whisper-large-v3 1550M 63.64%
This model 244M 20.21%

Key finding: This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.


Training Details

  • Base Model: openai/whisper-small
  • Dataset: sarulab-speech/commonvoice22_sidon (Belarusian subset)
  • Training Steps: 1700
  • Learning Rate: 1e-5
  • Batch Size: 8
  • Final Loss: ~0.21
  • Framework: PyTorch, Transformers, Accelerate

Usage

from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
result = pipe("audio_file.mp3")
print(result["text"])

For longer audio files:

result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
print(result["text"])

Description

English

This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.

Русский

Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.

Беларуская

Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.


Limitations

  • Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
  • Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
  • Best results on clear audio with minimal background noise

Citation

If you use this model, please cite:

@misc{whisper-small-be-custom,
  author = {aleton},
  title = {Whisper Small Belarusian Custom},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/aleton/whisper-small-be-custom}
}
Downloads last month
87
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Dataset used to train Aleton/whisper-small-be-custom

Evaluation results