Whisper Small Nepali

This model is a fine-tuned version of openai/whisper-small on a custom Nepali dataset. It achieves competitive performance on the Nepali ASR task.

Model Details

Model Description

  • Model type: Fine-tuned Whisper Small
  • Language(s): Nepali (ne)
  • License: Apache 2.0
  • Finetuned from model: openai/whisper-small

Training Hyperparameters

The following hyperparameters were used during training:

  • Learning Rate: 3e-05
  • Train Batch Size: 4
  • Eval Batch Size: 4
  • Gradient Accumulation Steps: 8 (Effective batch size: 32)
  • Optimizer: AdamW
  • LR Scheduler: Cosine
  • Warmup Steps: 500
  • Max Steps: 8000
  • Mixed Precision: FP16

Training Data

The model was trained on a custom Nepali dataset loaded via Pandas DataFrame.

  • Input: Audio files (sampled at 16kHz)
  • Target: Nepali Text transcriptions
  • Preprocessing: LogMelSpectrogram via WhisperFeatureExtractor

Evaluation Results

The model was evaluated on a held-out test set of 15,000 samples.

Metric Score
Word Error Rate (WER) 13.69%
Character Error Rate (CER) 3.43%

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="your-username/whisper-small-nepali")
transcrition = transcriber("path_to_nepali_audio.wav")

print(transcrition["text"])

Intended Uses

  • Primary Use: Transcription of Nepali speech in real-time or offline scenarios.
  • Limitations: Performance may vary on dialects or noisy audio not represented in the training set.

Training Procedure

The model was fine-tuned using the Hugging Face Seq2SeqTrainer with a custom DataCollatorSpeechSeq2SeqWithPadding.

Framework Versions

  • Transformers 4.x
  • Pytorch 2.x
  • Datasets 2.x
  • Tokenizers 0.x
Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bishesMaharjan/whisper-small-nepali

Finetuned
(3470)
this model

Evaluation results