Aleton commited on
Commit
6433923
·
verified ·
1 Parent(s): 8a596ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -39
README.md CHANGED
@@ -8,41 +8,60 @@ tags:
8
  - generated_from_trainer
9
  - audio
10
  - speech
 
11
  datasets:
12
  - sarulab-speech/commonvoice22_sidon
 
 
13
  model-index:
14
  - name: Whisper Small Belarusian Custom
15
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
  # Whisper Small Belarusian (Common Voice 22 Sidon)
19
 
20
- ## 🇺🇸 English
21
- This model is a fine-tuned version of [openai/whisper-small](https://huggingface.co/openai/whisper-small) on the Belarusian dataset from **Common Voice 22 Sidon** (`sarulab-speech/commonvoice22_sidon`). It achieves improved performance on Belarusian speech recognition compared to the base model.
22
 
23
- **Metrics (approximate):**
24
- - **WER (Word Error Rate):** ~20% (after 1200 steps)
25
- - **Loss:** ~0.21
26
 
27
- ### Usage
28
- ```python
29
- from transformers import pipeline
30
 
31
- pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
32
- result = pipe("audio_file.mp3")
33
- print(result["text"])
34
- ```
 
 
 
35
 
36
  ---
37
 
38
- ## 🇷🇺 Русский
39
- Эта модель является дообученной (fine-tuned) версией [openai/whisper-small](https://huggingface.co/openai/whisper-small) на наборе данных **Common Voice 22 Sidon** (белорусский язык). Модель показывает лучшие результаты распознавания белорусской речи по сравнению с базовой версией.
 
 
 
 
 
 
 
 
 
40
 
41
- **Метрики (примерные):**
42
- - **WER (Процент ошибок):** ~20% (после 1200 шагов обучения)
43
- - **Loss (Ошибка):** ~0.21
44
 
45
- ### Как использовать
46
  ```python
47
  from transformers import pipeline
48
 
@@ -51,31 +70,49 @@ result = pipe("audio_file.mp3")
51
  print(result["text"])
52
  ```
53
 
 
 
 
 
 
 
 
54
  ---
55
 
56
- ## 🇧🇾 Беларуская
57
- Гэтая мадэль з'яўляецца данавучанай (fine-tuned) версіяй [openai/whisper-small](https://huggingface.co/openai/whisper-small) на наборы дадзеных **Common Voice 22 Sidon** (беларуская мова). Мадэль паказвае лепшыя вынікі распазнавання беларускай мовы ў параўнанні з базавай версіяй.
58
 
59
- **Метрыкі (прыкладныя):**
60
- - **WER (Працэнт памылак):** ~20% (пасля 1200 крокаў навучання)
61
- - **Loss (Страты):** ~0.21
62
 
63
- ### Як выкарыстоўваць
64
- ```python
65
- from transformers import pipeline
66
 
67
- pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
68
- result = pipe("audio_file.mp3")
69
- print(result["text"])
70
- ```
 
 
 
71
 
72
  ---
73
 
74
- ### Training parameters / Параметры обучения / Параметры навучання
75
- - **Base Model:** openai/whisper-small
76
- - **Dataset:** sarulab-speech/commonvoice22_sidon (be)
77
- - **Learning Rate:** 1e-5
78
- - **Batch Size:** 8
79
- - **Steps:** 1200
80
- - **Framework:** PyTorch, Transformers, Accelerate
81
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - generated_from_trainer
9
  - audio
10
  - speech
11
+ - belarusian
12
  datasets:
13
  - sarulab-speech/commonvoice22_sidon
14
+ metrics:
15
+ - wer
16
  model-index:
17
  - name: Whisper Small Belarusian Custom
18
+ results:
19
+ - task:
20
+ type: automatic-speech-recognition
21
+ name: Speech Recognition
22
+ dataset:
23
+ name: Common Voice 22 Sidon (Belarusian)
24
+ type: sarulab-speech/commonvoice22_sidon
25
+ config: be
26
+ split: test
27
+ metrics:
28
+ - type: wer
29
+ value: 27.27
30
+ name: Word Error Rate
31
  ---
32
 
33
  # Whisper Small Belarusian (Common Voice 22 Sidon)
34
 
35
+ A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.
 
36
 
37
+ ---
 
 
38
 
39
+ ## Benchmark Results
 
 
40
 
41
+ | Model | Parameters | WER (lower is better) |
42
+ |:------|:----------:|----------------------:|
43
+ | openai/whisper-small (base) | 244M | 92.21% |
44
+ | openai/whisper-large-v3 | 1550M | 63.64% |
45
+ | **This model** | **244M** | **27.27%** |
46
+
47
+ **Key finding:** This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.
48
 
49
  ---
50
 
51
+ ## Training Details
52
+
53
+ - **Base Model:** openai/whisper-small
54
+ - **Dataset:** sarulab-speech/commonvoice22_sidon (Belarusian subset)
55
+ - **Training Steps:** 1700
56
+ - **Learning Rate:** 1e-5
57
+ - **Batch Size:** 8
58
+ - **Final Loss:** ~0.21
59
+ - **Framework:** PyTorch, Transformers, Accelerate
60
+
61
+ ---
62
 
63
+ ## Usage
 
 
64
 
 
65
  ```python
66
  from transformers import pipeline
67
 
 
70
  print(result["text"])
71
  ```
72
 
73
+ For longer audio files:
74
+
75
+ ```python
76
+ result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
77
+ print(result["text"])
78
+ ```
79
+
80
  ---
81
 
82
+ ## Description
 
83
 
84
+ ### English
 
 
85
 
86
+ This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.
 
 
87
 
88
+ ### Русский
89
+
90
+ Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.
91
+
92
+ ### Беларуская
93
+
94
+ Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.
95
 
96
  ---
97
 
98
+ ## Limitations
99
+
100
+ - Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
101
+ - Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
102
+ - Best results on clear audio with minimal background noise
103
+
104
+ ---
105
+
106
+ ## Citation
107
+
108
+ If you use this model, please cite:
109
+
110
+ ```bibtex
111
+ @misc{whisper-small-be-custom,
112
+ author = {aleton},
113
+ title = {Whisper Small Belarusian Custom},
114
+ year = {2024},
115
+ publisher = {Hugging Face},
116
+ url = {https://huggingface.co/aleton/whisper-small-be-custom}
117
+ }
118
+ ```