Whisper Small Mongolian ASR

Mongolian automatic speech recognition (ASR) model fine-tuned from openai/whisper-small using LoRA.

This model is trained for Mongolian speech transcription on a custom processed dataset.

Model Details

Base Model

openai/whisper-small

Fine-tuning Method

LoRA (PEFT)

Language

Mongolian (mn)

Task

Automatic Speech Recognition (ASR)

Hardware

-GPU T4 *2

Frameworks

Transformers
PEFT
PyTorch
Accelerate

HyperParameters

Parameter	Value
LoRA Rank	32
Batch Size	16 effective
Learning Rate	1e-3
Epochs	10
FP16	Enabled
Linear
Optimizer	Adamw
Warm_up_steps	500

Dataset

Custom processed Mongolian speech dataset.

Total samples:

9420 Text normalization includes:
Unicode normalization (NFC)
Lowercasing
Cyrillic-only filtering

Evaluation

Step	Train Loss	Validation Loss	WER (%)	Time (Hours)
400	2.1600	0.5055	53.69%	1.236
800	1.1970	0.3291	37.79%	3.071
1200	0.6450	0.2836	35.65%	4.894
1600	0.4786	0.2752	29.74%	6.730
2000	0.2775	0.2785	28.90%	8.555
2400	0.2198	0.2853	28.77%	9.782
2600	0.1858	0.2853	27.87%	10.776

Best Validation Metrics

Metric	Value
Validation Loss	~0.28
WER	~27.8%

Usage

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="nrlt/whisper-mn-small-full2"
)

result = pipe("audio.wav")
print(result["text"])

Downloads last month: 357

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for nrlt/whisper-mn-small-full2

Base model

openai/whisper-small

Adapter

(232)

this model