|
|
--- |
|
|
language: |
|
|
- be |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- automatic-speech-recognition |
|
|
- whisper |
|
|
- generated_from_trainer |
|
|
- audio |
|
|
- speech |
|
|
- belarusian |
|
|
datasets: |
|
|
- sarulab-speech/commonvoice22_sidon |
|
|
metrics: |
|
|
- wer |
|
|
model-index: |
|
|
- name: Whisper Small Belarusian Custom |
|
|
results: |
|
|
- task: |
|
|
type: automatic-speech-recognition |
|
|
name: Speech Recognition |
|
|
dataset: |
|
|
name: Common Voice 22 Sidon (Belarusian) |
|
|
type: sarulab-speech/commonvoice22_sidon |
|
|
config: be |
|
|
split: test |
|
|
metrics: |
|
|
- type: wer |
|
|
value: 27.27 |
|
|
name: Word Error Rate |
|
|
--- |
|
|
|
|
|
# Whisper Small Belarusian (Common Voice 22 Sidon) |
|
|
|
|
|
A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech. |
|
|
|
|
|
--- |
|
|
|
|
|
## Benchmark Results |
|
|
|
|
|
| Model | Parameters | WER (lower is better) | |
|
|
|:------|:----------:|----------------------:| |
|
|
| openai/whisper-small (base) | 244M | 92.21% | |
|
|
| openai/whisper-large-v3 | 1550M | 63.64% | |
|
|
| **This model** | **244M** | **20.21%** | |
|
|
|
|
|
**Key finding:** This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Base Model:** openai/whisper-small |
|
|
- **Dataset:** sarulab-speech/commonvoice22_sidon (Belarusian subset) |
|
|
- **Training Steps:** 1700 |
|
|
- **Learning Rate:** 1e-5 |
|
|
- **Batch Size:** 8 |
|
|
- **Final Loss:** ~0.21 |
|
|
- **Framework:** PyTorch, Transformers, Accelerate |
|
|
|
|
|
--- |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom") |
|
|
result = pipe("audio_file.mp3") |
|
|
print(result["text"]) |
|
|
``` |
|
|
|
|
|
For longer audio files: |
|
|
|
|
|
```python |
|
|
result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8) |
|
|
print(result["text"]) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Description |
|
|
|
|
|
### English |
|
|
|
|
|
This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model. |
|
|
|
|
|
### Русский |
|
|
|
|
|
Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3. |
|
|
|
|
|
### Беларуская |
|
|
|
|
|
Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3. |
|
|
|
|
|
--- |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model |
|
|
- Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions |
|
|
- Best results on clear audio with minimal background noise |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{whisper-small-be-custom, |
|
|
author = {aleton}, |
|
|
title = {Whisper Small Belarusian Custom}, |
|
|
year = {2026}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/aleton/whisper-small-be-custom} |
|
|
} |
|
|
``` |
|
|
|