whisper-small-ks / README.md
muneebharoon's picture
Training in progress, step 1000
27ec361 verified
metadata
library_name: transformers
language:
  - ks
base_model: openai/whisper-small
tags:
  - generated_from_trainer
datasets:
  - muneebharoon/whisper-kashmiri
metrics:
  - wer
model-index:
  - name: Whisper Small ks - Muneeb Haroon
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: whisper-kashmiri
          type: muneebharoon/whisper-kashmiri
          args: 'config: ks, split: test'
        metrics:
          - name: Wer
            type: wer
            value: 39.80769230769231

Whisper Small ks - Muneeb Haroon

This model is a fine-tuned version of openai/whisper-small on the whisper-kashmiri dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1578
  • Wer: 39.8077

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 2
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 10000
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer
0.0123 21.2811 1000 0.9382 48.125
0.0051 42.5622 2000 0.9946 42.4519
0.0032 63.8432 3000 1.0278 41.3942
0.0 85.1081 4000 1.1138 40.5288
0.0 106.3892 5000 1.1578 39.8077
0.0 127.6703 6000 1.1869 39.8077
0.0 148.9514 7000 1.2211 40.0
0.0 170.2162 8000 1.2430 40.2404
0.0 191.4973 9000 1.2679 40.2885
0.0 212.7784 10000 1.2762 40.3365

Framework versions

  • Transformers 4.49.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0