File size: 4,502 Bytes
30acb3d
d092fe3
 
 
 
 
 
 
 
 
6433923
d092fe3
 
6433923
 
d092fe3
 
6433923
 
 
 
 
 
 
 
 
 
 
 
 
9524a5d
 
d092fe3
9524a5d
6433923
9524a5d
6433923
9524a5d
6433923
9524a5d
6433923
 
 
 
15ff2a7
6433923
 
5b9ae6e
 
 
6433923
 
 
 
 
 
 
 
 
 
 
5b9ae6e
6433923
5b9ae6e
 
 
 
c82ca04
5b9ae6e
 
 
 
6433923
 
 
 
 
 
 
5b9ae6e
 
6433923
5b9ae6e
6433923
5b9ae6e
6433923
5b9ae6e
6433923
 
 
 
 
 
 
5b9ae6e
 
 
6433923
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3be5bbf
6433923
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
language:
- be
license: apache-2.0
tags:
- automatic-speech-recognition
- whisper
- generated_from_trainer
- audio
- speech
- belarusian
datasets:
- sarulab-speech/commonvoice22_sidon
metrics:
- wer
model-index:
- name: Whisper Small Belarusian Custom
  results:
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      name: Common Voice 22 Sidon (Belarusian)
      type: sarulab-speech/commonvoice22_sidon
      config: be
      split: test
    metrics:
    - type: wer
      value: 27.27
      name: Word Error Rate
---

# Whisper Small Belarusian (Common Voice 22 Sidon)

A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.

---

## Benchmark Results

| Model | Parameters | WER (lower is better) |
|:------|:----------:|----------------------:|
| openai/whisper-small (base) | 244M | 92.21% |
| openai/whisper-large-v3 | 1550M | 63.64% |
| **This model** | **244M** | **20.21%** |

**Key finding:** This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.

---

## Training Details

- **Base Model:** openai/whisper-small
- **Dataset:** sarulab-speech/commonvoice22_sidon (Belarusian subset)
- **Training Steps:** 1700
- **Learning Rate:** 1e-5
- **Batch Size:** 8
- **Final Loss:** ~0.21
- **Framework:** PyTorch, Transformers, Accelerate

---

## Usage

```python
from transformers import pipeline

pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
result = pipe("audio_file.mp3")
print(result["text"])
```

For longer audio files:

```python
result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
print(result["text"])
```

---

## Description

### English

This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.

### Русский

Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.

### Беларуская

Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.

---

## Limitations

- Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
- Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
- Best results on clear audio with minimal background noise

---

## Citation

If you use this model, please cite:

```bibtex
@misc{whisper-small-be-custom,
  author = {aleton},
  title = {Whisper Small Belarusian Custom},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/aleton/whisper-small-be-custom}
}
```