Whisper Small Nepali

This model is a fine-tuned version of openai/whisper-small on a custom Nepali dataset. It achieves competitive performance on the Nepali ASR task.

Model Details

Model Description

Model type: Fine-tuned Whisper Small
Language(s): Nepali (ne)
License: Apache 2.0
Finetuned from model: openai/whisper-small

Training Hyperparameters

The following hyperparameters were used during training:

Learning Rate: 3e-05
Train Batch Size: 4
Eval Batch Size: 4
Gradient Accumulation Steps: 8 (Effective batch size: 32)
Optimizer: AdamW
LR Scheduler: Cosine
Warmup Steps: 500
Max Steps: 8000
Mixed Precision: FP16

Training Data

The model was trained on a custom Nepali dataset loaded via Pandas DataFrame.

Input: Audio files (sampled at 16kHz)
Target: Nepali Text transcriptions
Preprocessing: LogMelSpectrogram via WhisperFeatureExtractor

Evaluation Results

The model was evaluated on a held-out test set of 15,000 samples.

Metric	Score
Word Error Rate (WER)	13.69%
Character Error Rate (CER)	3.43%

Usage

from transformers import pipeline

transcriber = pipeline("automatic-speech-recognition", model="your-username/whisper-small-nepali")
transcrition = transcriber("path_to_nepali_audio.wav")

print(transcrition["text"])

Intended Uses

Primary Use: Transcription of Nepali speech in real-time or offline scenarios.
Limitations: Performance may vary on dialects or noisy audio not represented in the training set.

Training Procedure

The model was fine-tuned using the Hugging Face Seq2SeqTrainer with a custom DataCollatorSpeechSeq2SeqWithPadding.

Framework Versions

Transformers 4.x
Pytorch 2.x
Datasets 2.x
Tokenizers 0.x

Downloads last month: 1

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for bishesMaharjan/whisper-small-nepali

Base model

openai/whisper-small

Finetuned

(3470)

this model

Evaluation results

Word Error Rate on Custom Nepali Dataset
self-reported

13.690
Character Error Rate on Custom Nepali Dataset
self-reported

3.430