Question about correct NeMo ASR model and configuration for Arabic / Quran recitation (streaming & fine-tuning)

by NovaCon-AI - opened Dec 18, 2025

Dec 18, 2025

Assalamu aleykum wa rahmatullahi wa barakatuhu,

I am currently working on an Arabic ASR project focused specifically on Quran recitation (tajweed-style, clear recitation, not conversational speech).

My goal is:
• Automatic Speech Recognition (ASR) for Quran recitation
• Ideally suitable for streaming / near–real-time use
• With the option to fine-tune on a custom Quran recitation dataset

While exploring NeMo-based setups (including this repository / endpoint), I am running into a fundamental problem:
I cannot clearly determine which ASR model is actually intended to be used, nor what the recommended configuration is for this use case.

Specifically, I am unclear about:

Which NeMo ASR model is recommended for Arabic Quran recitation (CTC vs RNNT, Conformer vs Parakeet, etc.)
Whether this setup is intended for offline or streaming ASR
How the model is selected (for example via config.repository or environment variables)
What the correct and stable configuration would be for fine-tuning on Arabic Quran data
Whether there are known limitations or best practices for Quran-style recitation (long vowels, pauses, tajweed)

At the moment, the code and Docker setup do not make it obvious which pretrained model is expected, and the README does not clarify this either.

I would really appreciate guidance on:
• The correct model choice
• A recommended configuration
• Whether this approach is suitable for fine-tuning on Quran recitation data

Barakallahu feek!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment