Aleton's picture
Update README.md
3be5bbf verified
---
language:
- be
license: apache-2.0
tags:
- automatic-speech-recognition
- whisper
- generated_from_trainer
- audio
- speech
- belarusian
datasets:
- sarulab-speech/commonvoice22_sidon
metrics:
- wer
model-index:
- name: Whisper Small Belarusian Custom
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
name: Common Voice 22 Sidon (Belarusian)
type: sarulab-speech/commonvoice22_sidon
config: be
split: test
metrics:
- type: wer
value: 27.27
name: Word Error Rate
---
# Whisper Small Belarusian (Common Voice 22 Sidon)
A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.
---
## Benchmark Results
| Model | Parameters | WER (lower is better) |
|:------|:----------:|----------------------:|
| openai/whisper-small (base) | 244M | 92.21% |
| openai/whisper-large-v3 | 1550M | 63.64% |
| **This model** | **244M** | **20.21%** |
**Key finding:** This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.
---
## Training Details
- **Base Model:** openai/whisper-small
- **Dataset:** sarulab-speech/commonvoice22_sidon (Belarusian subset)
- **Training Steps:** 1700
- **Learning Rate:** 1e-5
- **Batch Size:** 8
- **Final Loss:** ~0.21
- **Framework:** PyTorch, Transformers, Accelerate
---
## Usage
```python
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
result = pipe("audio_file.mp3")
print(result["text"])
```
For longer audio files:
```python
result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
print(result["text"])
```
---
## Description
### English
This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.
### Русский
Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.
### Беларуская
Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.
---
## Limitations
- Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
- Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
- Best results on clear audio with minimal background noise
---
## Citation
If you use this model, please cite:
```bibtex
@misc{whisper-small-be-custom,
author = {aleton},
title = {Whisper Small Belarusian Custom},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/aleton/whisper-small-be-custom}
}
```