Whisper Small Nepali
This model is a fine-tuned version of openai/whisper-small on a custom Nepali dataset. It achieves competitive performance on the Nepali ASR task.
Model Details
Model Description
- Model type: Fine-tuned Whisper Small
- Language(s): Nepali (ne)
- License: Apache 2.0
- Finetuned from model: openai/whisper-small
Training Hyperparameters
The following hyperparameters were used during training:
- Learning Rate: 3e-05
- Train Batch Size: 4
- Eval Batch Size: 4
- Gradient Accumulation Steps: 8 (Effective batch size: 32)
- Optimizer: AdamW
- LR Scheduler: Cosine
- Warmup Steps: 500
- Max Steps: 8000
- Mixed Precision: FP16
Training Data
The model was trained on a custom Nepali dataset loaded via Pandas DataFrame.
- Input: Audio files (sampled at 16kHz)
- Target: Nepali Text transcriptions
- Preprocessing:
LogMelSpectrogramviaWhisperFeatureExtractor
Evaluation Results
The model was evaluated on a held-out test set of 15,000 samples.
| Metric | Score |
|---|---|
| Word Error Rate (WER) | 13.69% |
| Character Error Rate (CER) | 3.43% |
Usage
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="your-username/whisper-small-nepali")
transcrition = transcriber("path_to_nepali_audio.wav")
print(transcrition["text"])
Intended Uses
- Primary Use: Transcription of Nepali speech in real-time or offline scenarios.
- Limitations: Performance may vary on dialects or noisy audio not represented in the training set.
Training Procedure
The model was fine-tuned using the Hugging Face Seq2SeqTrainer with a custom DataCollatorSpeechSeq2SeqWithPadding.
Framework Versions
- Transformers 4.x
- Pytorch 2.x
- Datasets 2.x
- Tokenizers 0.x
- Downloads last month
- 1
Model tree for bishesMaharjan/whisper-small-nepali
Base model
openai/whisper-smallEvaluation results
- Word Error Rate on Custom Nepali Datasetself-reported13.690
- Character Error Rate on Custom Nepali Datasetself-reported3.430