Whisper Small Mongolian ASR
Mongolian automatic speech recognition (ASR) model fine-tuned from openai/whisper-small using LoRA.
This model is trained for Mongolian speech transcription on a custom processed dataset.
Model Details
Base Model
openai/whisper-small
Fine-tuning Method
- LoRA (PEFT)
Language
- Mongolian (
mn)
Task
- Automatic Speech Recognition (ASR)
Hardware
-GPU T4 *2
Frameworks
- Transformers
- PEFT
- PyTorch
- Accelerate
HyperParameters
| Parameter | Value |
|---|---|
| LoRA Rank | 32 |
| Batch Size | 16 effective |
| Learning Rate | 1e-3 |
| Epochs | 10 |
| FP16 | Enabled |
| Linear | |
| Optimizer | Adamw |
| Warm_up_steps | 500 |
Dataset
Custom processed Mongolian speech dataset.
Total samples:
- 9420 Text normalization includes:
- Unicode normalization (NFC)
- Lowercasing
- Cyrillic-only filtering
Evaluation
| Step | Train Loss | Validation Loss | WER (%) | Time (Hours) |
|---|---|---|---|---|
| 400 | 2.1600 | 0.5055 | 53.69% | 1.236 |
| 800 | 1.1970 | 0.3291 | 37.79% | 3.071 |
| 1200 | 0.6450 | 0.2836 | 35.65% | 4.894 |
| 1600 | 0.4786 | 0.2752 | 29.74% | 6.730 |
| 2000 | 0.2775 | 0.2785 | 28.90% | 8.555 |
| 2400 | 0.2198 | 0.2853 | 28.77% | 9.782 |
| 2600 | 0.1858 | 0.2853 | 27.87% | 10.776 |
Best Validation Metrics
| Metric | Value |
|---|---|
| Validation Loss | ~0.28 |
| WER | ~27.8% |
Usage
from transformers import pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="nrlt/whisper-mn-small-full2"
)
result = pipe("audio.wav")
print(result["text"])
- Downloads last month
- 357
Model tree for nrlt/whisper-mn-small-full2
Base model
openai/whisper-small