Update README.md

1bee0fd verified 2 months ago

1.92 kB

license: apache-2.0
language:
  - sr
base_model:
  - openai/whisper-small
datasets:
  - google/fleurs
  - Sagicc/audio-lmb-ds
  - espnet/yodas_owsmv4
  - classla/ParlaSpeech-RS
metrics:
  - wer
model-index:
  - name: Whisper Small
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Common Voice 24.0
          type: mozilla-foundation/common_voice_24_0
          config: sr
          split: test
          args: sr
        metrics:
          - name: Wer
            type: wer
            value: 0.065924219787
library_name: transformers

whisper-small-sr

Fine-tuned OpenAI Whisper Small.

Output script: this model is intended to produce Serbian Latin only.

WER on Common Voice 24.0 Serbian test: 6.59%

Model description

Training and evaluation data

This model was fine-tuned on a mixture of publicly available Serbian speech corpora, including:

Mozilla Common Voice 24.0, evaluated on CV test (sr)
FLEURS Serbian
ParlaSpeech-RS (subset of the full dataset)
Additional Serbian corpora used in the training pipeline

Training procedure

Epochs: 9
Batch size: 32 / 20
Optimizer: AdamW
LR: 6e-5 with warmup (50 steps) + cosine decay to min_lr = 1e-7
Mixed precision: bfloat16 (fp32 in the final epoch)
SpecAugment: frequency + time masking
Sampling: weighted sampling across datasets

Training results

Epoch	Train loss	CV WER
1	0.333	0.1614
2	0.344	0.1278
3	0.251	0.1112
4	0.202	0.1032
5	0.167	0.0934
6	0.138	0.0790
7	0.118	0.0740
8	0.103	0.0709
9	0.096	0.0659

Evaluation Metrics

WER (normalized) on Common Voice 24.0 Serbian test: 7.09%
Text normalization used for WER:
- punctuation removed
- lowercased
- Cyrillic → Latin conversion
- numbers converted to words