Hallucinations in quran and hadith!
I tested your model on quran and hadith dataset of [Dr Ali Gomaa dataset]: (https://huggingface.co/datasets/DrAliGomaa/arabic_quran_hadith14books_cmvoice17_fleurs_mediaspeech) and it was overfitting to media speech , I think this is due to mgb2 dataset extensive training on, Why not fine-tunning on many datasets including this hadith/quran dataset to achieve better WER and this won't affect the WER on mgb2 but may help in better accuracy and generalization.
Having an arabic model that has a very bad performance on quran and hadith is actually meaningless, Whisper without fine-tunning has 3.5% WER on Quran on this dataset and 7.50% on Hadith.
I encourage your work but your model is actually worthless if it's performance is bad on Quran/Hadith, MSA is about quran and hadith , this is the source of truth for arabic language. your model excels in speed , but i will never use something that is degraded on God's words and his prophet.