Whisper Small Mongolian ASR

Mongolian automatic speech recognition (ASR) model fine-tuned from openai/whisper-small using LoRA.

This model is trained for Mongolian speech transcription on a custom processed dataset.


Model Details

Base Model

  • openai/whisper-small

Fine-tuning Method

  • LoRA (PEFT)

Language

  • Mongolian (mn)

Task

  • Automatic Speech Recognition (ASR)

Hardware

-GPU T4 *2

Frameworks

  • Transformers
  • PEFT
  • PyTorch
  • Accelerate

HyperParameters

Parameter Value
LoRA Rank 32
Batch Size 16 effective
Learning Rate 1e-3
Epochs 10
FP16 Enabled
Linear
Optimizer Adamw
Warm_up_steps 500

Dataset

Custom processed Mongolian speech dataset.

Total samples:

  • 9420 Text normalization includes:
  • Unicode normalization (NFC)
  • Lowercasing
  • Cyrillic-only filtering

Evaluation

Step Train Loss Validation Loss WER (%) Time (Hours)
400 2.1600 0.5055 53.69% 1.236
800 1.1970 0.3291 37.79% 3.071
1200 0.6450 0.2836 35.65% 4.894
1600 0.4786 0.2752 29.74% 6.730
2000 0.2775 0.2785 28.90% 8.555
2400 0.2198 0.2853 28.77% 9.782
2600 0.1858 0.2853 27.87% 10.776

Best Validation Metrics

Metric Value
Validation Loss ~0.28
WER ~27.8%

Usage

from transformers import pipeline

pipe = pipeline(
    "automatic-speech-recognition",
    model="nrlt/whisper-mn-small-full2"
)

result = pipe("audio.wav")
print(result["text"])
Downloads last month
357
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nrlt/whisper-mn-small-full2

Adapter
(232)
this model