Update README.md

3be5bbf verified 16 days ago

4.5 kB

	---
	language:
	- be
	license: apache-2.0
	tags:
	- automatic-speech-recognition
	- whisper
	- generated_from_trainer
	- audio
	- speech
	- belarusian
	datasets:
	- sarulab-speech/commonvoice22_sidon
	metrics:
	- wer
	model-index:
	- name: Whisper Small Belarusian Custom
	results:
	- task:
	type: automatic-speech-recognition
	name: Speech Recognition
	dataset:
	name: Common Voice 22 Sidon (Belarusian)
	type: sarulab-speech/commonvoice22_sidon
	config: be
	split: test
	metrics:
	- type: wer
	value: 27.27
	name: Word Error Rate
	---

	# Whisper Small Belarusian (Common Voice 22 Sidon)

	A fine-tuned version of openai/whisper-small optimized for Belarusian speech recognition. This model significantly outperforms both the base Whisper Small and even the much larger Whisper Large V3 on Belarusian speech.

	---

	## Benchmark Results

	\| Model \| Parameters \| WER (lower is better) \|
	\|:------\|:----------:\|----------------------:\|
	\| openai/whisper-small (base) \| 244M \| 92.21% \|
	\| openai/whisper-large-v3 \| 1550M \| 63.64% \|
	\| This model \| 244M \| 20.21% \|

	Key finding: This fine-tuned Small model outperforms Whisper Large V3 by 36.37 percentage points, while being 6x smaller in size.

	---

	## Training Details

	- Base Model: openai/whisper-small
	- Dataset: sarulab-speech/commonvoice22_sidon (Belarusian subset)
	- Training Steps: 1700
	- Learning Rate: 1e-5
	- Batch Size: 8
	- Final Loss: ~0.21
	- Framework: PyTorch, Transformers, Accelerate

	---

	## Usage

	```python
	from transformers import pipeline

	pipe = pipeline("automatic-speech-recognition", model="aleton/whisper-small-be-custom")
	result = pipe("audio_file.mp3")
	print(result["text"])
	```

	For longer audio files:

	```python
	result = pipe("long_audio.mp3", chunk_length_s=30, batch_size=8)
	print(result["text"])
	```

	---

	## Description

	### English

	This model demonstrates that targeted fine-tuning on language-specific data can dramatically improve performance for low-resource languages. The base Whisper models struggle with Belarusian due to limited representation in the original training data. Through fine-tuning on Common Voice 22 Sidon, this model achieves a 64.94 percentage point improvement over the base Small model and a 36.37 percentage point improvement over the Large V3 model.

	### Русский

	Эта модель демонстрирует, что целенаправленное дообучение на языковых данных может значительно улучшить качество распознавания для малоресурсных языков. Базовые модели Whisper плохо справляются с белорусским языком из-за ограниченного представления в исходных обучающих данных. Благодаря дообучению на Common Voice 22 Sidon, эта модель показывает улучшение на 64.94 п.п. по сравнению с базовой Small моделью и на 36.37 п.п. по сравнению с Large V3.

	### Беларуская

	Гэтая мадэль дэманструе, што мэтанакіраванае данавучанне на моўных дадзеных можа значна палепшыць якасць распазнавання для маларэсурсных моў. Базавыя мадэлі Whisper дрэнна спраўляюцца з беларускай мовай з-за абмежаванага прадстаўніцтва ў зыходных навучальных дадзеных. Дзякуючы данавучанню на Common Voice 22 Sidon, гэтая мадэль паказвае паляпшэнне на 64.94 п.п. у параўнанні з базавай Small мадэллю і на 36.37 п.п. у параўнанні з Large V3.

	---

	## Limitations

	- Optimized specifically for Belarusian; performance on other languages may be degraded compared to the base model
	- Trained on Common Voice data, which may not fully represent all dialects or acoustic conditions
	- Best results on clear audio with minimal background noise

	---

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{whisper-small-be-custom,
	author = {aleton},
	title = {Whisper Small Belarusian Custom},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/aleton/whisper-small-be-custom}
	}
	```