File size: 4,502 Bytes

---
language:
- be
license: apache-2.0
tags:
- automatic-speech-recognition
- whisper
- generated_from_trainer
- audio
- speech
- belarusian
datasets:
- sarulab-speech/commonvoice22_sidon
metrics:
- wer
model-index:
- name: Whisper Small Belarusian Custom
  results:
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      name: Common Voice 22 Sidon (Belarusian)
      type: sarulab-speech/commonvoice22_sidon
      config: be
      split: test
    metrics:
    - type: wer
      value: 27.27
      name: Word Error Rate
---

# Whisper Small Belarusian (Common Voice 22 Sidon)

A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.

---

## Benchmark Results

| Model | Parameters | WER (lower is better) |
|:------|:----------:|----------------------:|
| openai/whisper-small (base) | 244M | 92.21% |
| openai/whisper-large-v3 | 1550M | 63.64% |
| **This model** | **244M** | **20.21%** |

**Key finding:** This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.

---

## Training Details

- **Base Model:** openai/whisper-small
- **Dataset:** sarulab-speech/commonvoice22_sidon (Belarusian subset)
- **Training Steps:** 1700
- **Learning Rate:** 1e-5
- **Batch Size:** 8
- **Final Loss:** ~0.21
- **Framework:** PyTorch, Transformers, Accelerate

---

## Usage

```python
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
result = pipe("audio_file.mp3")
print(result["text"])
```

For longer audio files:

```python
result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
print(result["text"])
```

---

## Description

### English

This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.

### Русский

Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.

### Беларуская

Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.

---

## Limitations

- Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
- Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
- Best results on clear audio with minimal background noise

---

## Citation

If you use this model, please cite:

```bibtex
@misc{whisper-small-be-custom,
  author = {aleton},
  title = {Whisper Small Belarusian Custom},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/aleton/whisper-small-be-custom}
}
```