--- language: - be license: apache-2.0 tags: - automatic-speech-recognition - whisper - generated_from_trainer - audio - speech - belarusian datasets: - sarulab-speech/commonvoice22_sidon metrics: - wer model-index: - name: Whisper Small Belarusian Custom results: - task: type: automatic-speech-recognition name: Speech Recognition dataset: name: Common Voice 22 Sidon (Belarusian) type: sarulab-speech/commonvoice22_sidon config: be split: test metrics: - type: wer value: 27.27 name: Word Error Rate --- # Whisper Small Belarusian (Common Voice 22 Sidon) A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech. --- ## Benchmark Results | Model | Parameters | WER (lower is better) | |:------|:----------:|----------------------:| | openai/whisper-small (base) | 244M | 92.21% | | openai/whisper-large-v3 | 1550M | 63.64% | | **This model** | **244M** | **20.21%** | **Key finding:** This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size. --- ## Training Details - **Base Model:** openai/whisper-small - **Dataset:** sarulab-speech/commonvoice22_sidon (Belarusian subset) - **Training Steps:** 1700 - **Learning Rate:** 1e-5 - **Batch Size:** 8 - **Final Loss:** ~0.21 - **Framework:** PyTorch, Transformers, Accelerate --- ## Usage ```python from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom") result = pipe("audio_file.mp3") print(result["text"]) ``` For longer audio files: ```python result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8) print(result["text"]) ``` --- ## Description ### English This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model. ### Русский Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3. ### Беларуская Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3. --- ## Limitations - Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model - Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions - Best results on clear audio with minimal background noise --- ## Citation If you use this model, please cite: ```bibtex @misc{whisper-small-be-custom, author = {aleton}, title = {Whisper Small Belarusian Custom}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/aleton/whisper-small-be-custom} } ```