MMS Trilingual ASR v2 - Dhivehi + Arabic + English
Fine-tuned version of mms-trilingual-dv-ar-en with improved:
- Conversational Arabic recognition (FLEURS Arabic)
- Melodic Dhivehi (Madhaha/podcasts) recognition
Changes from v1
- Added conversational Arabic data (FLEURS) to replace Quranic-only training
- Added melodic Dhivehi (audio casts) to fix Madhaha confusion with Arabic
- Removed Quranic recitation data
Training Data
- Arabic: ~2500 samples from FLEURS (conversational)
- Dhivehi Melodic: 1000 samples from audio casts
- Dhivehi Normal: ~1500 samples
- English: ~500 samples from LibriSpeech
Performance
- Final WER: 0.2820
Usage
from transformers import AutoProcessor, Wav2Vec2ForCTC
import torch
processor = AutoProcessor.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v2")
model = Wav2Vec2ForCTC.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v2")
# Process audio (16kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
Supported Languages
- Dhivehi (Thaana script) - including melodic/Madhaha
- Arabic (Arabic script) - conversational style
- English (Latin script)
- Downloads last month
- 26