Serialtechlab/dhivehi-mms-v5-combined
Viewer • Updated • 9.66k • 22 • 2
Fine-tuned version of mms-trilingual-dv-ar-en-v2 with fixed Madhaha recognition.
v2 model confused melodic Dhivehi (Madhaha/religious songs) with Arabic, outputting Arabic script instead of Thaana. This version fixes that issue.
from transformers import AutoProcessor, Wav2Vec2ForCTC
import torch
processor = AutoProcessor.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v3")
model = Wav2Vec2ForCTC.from_pretrained("Serialtechlab/mms-trilingual-dv-ar-en-v3")
# Process audio (16kHz)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
Base model
Serialtechlab/mms-trilingual-dv-ar-en