whisper-small-sr / README.md
istomin9192's picture
Update README.md
1bee0fd verified
metadata
license: apache-2.0
language:
  - sr
base_model:
  - openai/whisper-small
datasets:
  - google/fleurs
  - Sagicc/audio-lmb-ds
  - espnet/yodas_owsmv4
  - classla/ParlaSpeech-RS
metrics:
  - wer
model-index:
  - name: Whisper Small
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 24.0
          type: mozilla-foundation/common_voice_24_0
          config: sr
          split: test
          args: sr
        metrics:
          - name: Wer
            type: wer
            value: 0.065924219787
library_name: transformers

whisper-small-sr

Fine-tuned OpenAI Whisper Small.

Output script: this model is intended to produce Serbian Latin only.

  • WER on Common Voice 24.0 Serbian test: 6.59%

Model description

Training and evaluation data

This model was fine-tuned on a mixture of publicly available Serbian speech corpora, including:

  • Mozilla Common Voice 24.0, evaluated on CV test (sr)
  • FLEURS Serbian
  • ParlaSpeech-RS (subset of the full dataset)
  • Additional Serbian corpora used in the training pipeline

Training procedure

  • Epochs: 9
  • Batch size: 32 / 20
  • Optimizer: AdamW
  • LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
  • Mixed precision: bfloat16 (fp32 in the final epoch)
  • SpecAugment: frequency + time masking
  • Sampling: weighted sampling across datasets

Training results

Epoch Train loss CV WER
1 0.333 0.1614
2 0.344 0.1278
3 0.251 0.1112
4 0.202 0.1032
5 0.167 0.0934
6 0.138 0.0790
7 0.118 0.0740
8 0.103 0.0709
9 0.096 0.0659

Evaluation Metrics

  • WER (normalized) on Common Voice 24.0 Serbian test: 7.09%
  • Text normalization used for WER:
    • punctuation removed
    • lowercased
    • Cyrillic → Latin conversion
    • numbers converted to words