Hallucinations in quran and hadith!

by DrAliGomaa - opened May 28, 2025

May 28, 2025

I tested your model on quran and hadith dataset of [Dr Ali Gomaa dataset]: (https://huggingface.co/datasets/DrAliGomaa/arabic_quran_hadith14books_cmvoice17_fleurs_mediaspeech) and it was overfitting to media speech , I think this is due to mgb2 dataset extensive training on, Why not fine-tunning on many datasets including this hadith/quran dataset to achieve better WER and this won't affect the WER on mgb2 but may help in better accuracy and generalization.
Having an arabic model that has a very bad performance on quran and hadith is actually meaningless, Whisper without fine-tunning has 3.5% WER on Quran on this dataset and 7.50% on Hadith.
I encourage your work but your model is actually worthless if it's performance is bad on Quran/Hadith, MSA is about quran and hadith , this is the source of truth for arabic language. your model excels in speed , but i will never use something that is degraded on God's words and his prophet.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment