You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MMS Trilingual ASR v3 - Dhivehi + Arabic + English (Madhaha Fix)

Fine-tuned version of mms-trilingual-dv-ar-en-v2 with fixed Madhaha recognition.

Problem Solved

v2 model confused melodic Dhivehi (Madhaha/religious songs) with Arabic, outputting Arabic script instead of Thaana. This version fixes that issue.

Training Strategy

  • Started from v2 model (preserves improved English/Arabic recognition)
  • Trained ONLY on Dhivehi data (no Arabic interference)
  • Oversampled melodic Dhivehi 3x to emphasize the pattern
  • Higher learning rate (3e-05) to change associations aggressively
  • 5 epochs for stronger reinforcement

Training Data

  • Melodic Dhivehi: ~3000 samples (oversampled from audio casts)
  • Normal Dhivehi: ~1500 samples

Performance

  • Final WER: 0.2153

Usage

from transformers import AutoProcessor, Wav2Vec2ForCTC
import torch

processor = AutoProcessor.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v3")
model = Wav2Vec2ForCTC.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v3")

# Process audio (16kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]

Supported Languages

  • Dhivehi (Thaana script) - including melodic/Madhaha
  • Arabic (Arabic script) - preserved from v2
  • English (Latin script) - preserved from v2 (improved Thaana transliteration)

Changes from v2

  • v3 specifically targets the Madhaha confusion issue
  • Melodic Dhivehi now correctly outputs Thaana script
  • Preserves v2's improved English and Arabic recognition
Downloads last month
14
Safetensors
Model size
1.0B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Serialtechlab/mms-trilingual-dv-ar-en-v3

Datasets used to train Serialtechlab/mms-trilingual-dv-ar-en-v3