You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

MMS Trilingual ASR v2 - Dhivehi + Arabic + English

Fine-tuned version of mms-trilingual-dv-ar-en with improved:

  • Conversational Arabic recognition (FLEURS Arabic)
  • Melodic Dhivehi (Madhaha/podcasts) recognition

Changes from v1

  • Added conversational Arabic data (FLEURS) to replace Quranic-only training
  • Added melodic Dhivehi (audio casts) to fix Madhaha confusion with Arabic
  • Removed Quranic recitation data

Training Data

  • Arabic: ~2500 samples from FLEURS (conversational)
  • Dhivehi Melodic: 1000 samples from audio casts
  • Dhivehi Normal: ~1500 samples
  • English: ~500 samples from LibriSpeech

Performance

  • Final WER: 0.2820

Usage

from transformers import AutoProcessor, Wav2Vec2ForCTC
import torch

processor = AutoProcessor.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v2")
model = Wav2Vec2ForCTC.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v2")

# Process audio (16kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]

Supported Languages

  • Dhivehi (Thaana script) - including melodic/Madhaha
  • Arabic (Arabic script) - conversational style
  • English (Latin script)
Downloads last month
26
Safetensors
Model size
1.0B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Serialtechlab/mms-trilingual-dv-ar-en-v2

Finetuned
(1)
this model
Finetunes
1 model

Datasets used to train Serialtechlab/mms-trilingual-dv-ar-en-v2