File size: 4,502 Bytes
30acb3d d092fe3 6433923 d092fe3 6433923 d092fe3 6433923 9524a5d d092fe3 9524a5d 6433923 9524a5d 6433923 9524a5d 6433923 9524a5d 6433923 15ff2a7 6433923 5b9ae6e 6433923 5b9ae6e 6433923 5b9ae6e c82ca04 5b9ae6e 6433923 5b9ae6e 6433923 5b9ae6e 6433923 5b9ae6e 6433923 5b9ae6e 6433923 5b9ae6e 6433923 3be5bbf 6433923 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
---
language:
- be
license: apache-2.0
tags:
- automatic-speech-recognition
- whisper
- generated_from_trainer
- audio
- speech
- belarusian
datasets:
- sarulab-speech/commonvoice22_sidon
metrics:
- wer
model-index:
- name: Whisper Small Belarusian Custom
results:
- task:
type: automatic-speech-recognition
name: Speech Recognition
dataset:
name: Common Voice 22 Sidon (Belarusian)
type: sarulab-speech/commonvoice22_sidon
config: be
split: test
metrics:
- type: wer
value: 27.27
name: Word Error Rate
---
# Whisper Small Belarusian (Common Voice 22 Sidon)
A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.
---
## Benchmark Results
| Model | Parameters | WER (lower is better) |
|:------|:----------:|----------------------:|
| openai/whisper-small (base) | 244M | 92.21% |
| openai/whisper-large-v3 | 1550M | 63.64% |
| **This model** | **244M** | **20.21%** |
**Key finding:** This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.
---
## Training Details
- **Base Model:** openai/whisper-small
- **Dataset:** sarulab-speech/commonvoice22_sidon (Belarusian subset)
- **Training Steps:** 1700
- **Learning Rate:** 1e-5
- **Batch Size:** 8
- **Final Loss:** ~0.21
- **Framework:** PyTorch, Transformers, Accelerate
---
## Usage
```python
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
result = pipe("audio_file.mp3")
print(result["text"])
```
For longer audio files:
```python
result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
print(result["text"])
```
---
## Description
### English
This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.
### Русский
Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.
### Беларуская
Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.
---
## Limitations
- Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
- Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
- Best results on clear audio with minimal background noise
---
## Citation
If you use this model, please cite:
```bibtex
@misc{whisper-small-be-custom,
author = {aleton},
title = {Whisper Small Belarusian Custom},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/aleton/whisper-small-be-custom}
}
```
|