MMS-1B-All fine-tuned on Darija Bible Aligned Dataset
This model is a fine-tuned version of facebook/mms-1b-all for Moroccan Arabic (Darija) speech recognition, trained on the Darija Bible Aligned Dataset provided by AtlasAI.
π Dataset
- Name:
atlasia/darija_bible_aligned - Domain: Religious texts (Moroccan Darija audio β Arabic text)
- License: See original dataset page.
π― Intended use
This model is intended for research and experimentation in low-resource Arabic dialect ASR.
π Training Details
- Base model:
facebook/mms-1b-all - Language:
ara(Moroccan Arabic) - Framework:
Transformers+datasets - WER on eval set: To be filled after training
π Acknowledgements
Special thanks to AtlasAI for providing the aligned Darija Bible dataset.
π§ͺ Demo
Try the model on your own audio! Check out the demo in the Space or use this snippet:
from transformers import AutoProcessor, AutoModelForCTC
import torchaudio
processor = AutoProcessor.from_pretrained("your-username/mms-darija-finetuned")
model = AutoModelForCTC.from_pretrained("your-username/mms-darija-finetuned")
waveform, sr = torchaudio.load("your_audio.wav")
inputs = processor(waveform.squeeze().numpy(), sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)[0]
print(transcription)