Serialtechlab's picture
Upload README.md with huggingface_hub
439ffd0 verified
---
language:
- dv
- ar
- en
license: cc-by-nc-4.0
tags:
- automatic-speech-recognition
- mms
- ctc
- trilingual
- dhivehi
- arabic
- english
- madhaha
datasets:
- shiimi/dhivehi-audio-casts-processed
- Serialtechlab/dhivehi-mms-v5-combined
metrics:
- wer
base_model: Serialtechlab/mms-trilingual-dv-ar-en-v2
---
# MMS Trilingual ASR v3 - Dhivehi + Arabic + English (Madhaha Fix)
Fine-tuned version of mms-trilingual-dv-ar-en-v2 with **fixed Madhaha recognition**.
## Problem Solved
v2 model confused melodic Dhivehi (Madhaha/religious songs) with Arabic,
outputting Arabic script instead of Thaana. This version fixes that issue.
## Training Strategy
- Started from v2 model (preserves improved English/Arabic recognition)
- Trained ONLY on Dhivehi data (no Arabic interference)
- Oversampled melodic Dhivehi 3x to emphasize the pattern
- Higher learning rate (3e-05) to change associations aggressively
- 5 epochs for stronger reinforcement
## Training Data
- Melodic Dhivehi: ~3000 samples (oversampled from audio casts)
- Normal Dhivehi: ~1500 samples
## Performance
- Final WER: 0.2153
## Usage
```python
from transformers import AutoProcessor, Wav2Vec2ForCTC
import torch
processor = AutoProcessor.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v3")
model = Wav2Vec2ForCTC.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v3")
# Process audio (16kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
```
## Supported Languages
- Dhivehi (Thaana script) - including melodic/Madhaha
- Arabic (Arabic script) - preserved from v2
- English (Latin script) - preserved from v2 (improved Thaana transliteration)
## Changes from v2
- v3 specifically targets the Madhaha confusion issue
- Melodic Dhivehi now correctly outputs Thaana script
- Preserves v2's improved English and Arabic recognition