atlasia/darija_bible_aligned
Viewer • Updated • 15.8k • 5 • 3
This model is a fine-tuned version of facebook/mms-1b-all on the atlasia/darija_bible_aligned dataset for Moroccan Arabic (Darija) speech recognition.
from transformers import AutoProcessor, AutoModelForCTC
import torch
import librosa
# Load model and processor
processor = AutoProcessor.from_pretrained("HAMMALE/mms-darija-finetuned")
model = AutoModelForCTC.from_pretrained("HAMMALE/mms-darija-finetuned")
# Load and preprocess audio
audio, sr = librosa.load("path/to/darija/audio.wav", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
# Inference
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(f"Transcription: {transcription}")
The model was fine-tuned on the Darija Bible Aligned Dataset, which contains audio segments from the Moroccan Standard Translation (MSTD) of the Bible with aligned text transcriptions.
@misc{darija-mms-finetuned,
title={MMS-1B-All Fine-tuned on Darija Bible Dataset},
author={HAMMALE},
year={2025},
publisher={Hugging Face},
journal={Hugging Face Model Hub},
howpublished={\url{https://huggingface.co/HAMMALE/mms-darija-finetuned}}
}