istomin9192
/

whisper-small-sr

Automatic Speech Recognition

Eval Results (legacy)

Model card Files Files and versions

whisper-small-sr

Fine-tuned OpenAI Whisper Small.

Output script: this model is intended to produce Serbian Latin only.

WER on Common Voice 24.0 Serbian test: 6.59%

Model description

Training and evaluation data

This model was fine-tuned on a mixture of publicly available Serbian speech corpora, including:

Mozilla Common Voice 24.0, evaluated on CV test (sr)
FLEURS Serbian
ParlaSpeech-RS (subset of the full dataset)
Additional Serbian corpora used in the training pipeline

Training procedure

Epochs: 9
Batch size: 32 / 20
Optimizer: AdamW
LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
Mixed precision: bfloat16 (fp32 in the final epoch)
SpecAugment: frequency + time masking
Sampling: weighted sampling across datasets

Training results

Epoch	Train loss	CV WER
1	0.333	0.1614
2	0.344	0.1278
3	0.251	0.1112
4	0.202	0.1032
5	0.167	0.0934
6	0.138	0.0790
7	0.118	0.0740
8	0.103	0.0709
9	0.096	0.0659

Evaluation Metrics

WER (normalized) on Common Voice 24.0 Serbian test: 7.09%
Text normalization used for WER:
- punctuation removed
- lowercased
- Cyrillic → Latin conversion
- numbers converted to words

Downloads last month: 73

Safetensors

Model size

0.2B params

Tensor type

F32

·

Model tree for istomin9192/whisper-small-sr

Base model

openai/whisper-small

Finetuned

(3647)

this model

Datasets used to train istomin9192/whisper-small-sr

Space using istomin9192/whisper-small-sr 1

Evaluation results

Wer on Common Voice 24.0
test set self-reported

0.066