stt_arabic_quartznet15x5_v1

Model Overview

Model Type: Automatic Speech Recognition (ASR)
Language: Arabic (Quranic Arabic with diacritics)
Developed by: Muhammad Haris Waqar, Tahir Ahmed Khan

This model is a QuartzNet15x5-based NVIDIA NeMo ASR model, fine-tuned specifically for Quranic recitation and Qaida-based Arabic pronunciation.
It is optimized to handle Arabic phonetics, diacritics (Tashkeel), and religious recitation styles.

Model Architecture

Base Architecture: QuartzNet15x5
Framework: NVIDIA NeMo
Encoder: Time-Channel Separable Convolutions
Decoder: CTC
Tokenizer: Character-level (Arabic with diacritics)
Audio Features: 64-dimensional Mel-spectrogram
Sample Rate: 16 kHz

Intended Use

This model is suitable for:

Quranic recitation transcription
Qaida-based Arabic learning systems
Pronunciation evaluation and feedback
Educational and religious ASR applications
Arabic speech recognition with diacritics

Training Details

Dataset

Custom Qaida Quranic Arabic dataset
Diacritized Arabic transcriptions
Carefully curated religious audio content

Training Configuration

Optimizer: Novograd
Learning Rate: 0.01 (polynomial decay)
Batch Size: 32
Precision: FP32
Spectrogram Augmentation: Enabled

Best Checkpoint

Checkpoint: QuartzNet15x5--val_wer=0.1131-epoch=61.ckpt
Epoch: 61
Validation WER: 0.1131

Usage

Installation

pip install nemo_toolkit[asr]

Inference Example

import nemo.collections.asr as nemo_asr

# Restore locally
asr_model = nemo_asr.models.ASRModel.restore_from("stt_arabic_quartznet15x5_v1.nemo")

transcription = asr_model.transcribe(["audio.wav"])
print(transcription[0])

Hugging Face Hub Usage

from huggingface_hub import hf_hub_download
import nemo.collections.asr as nemo_asr

model_path = hf_hub_download(
    repo_id="9DTechnologies/QuartzNet_quran_v1",
    filename="stt_arabic_quartznet15x5_v1.nemo"
)

asr_model = nemo_asr.models.ASRModel.restore_from(model_path)

Requirements

Python >= 3.8
torch >= 2.0
nemo_toolkit[asr] >= 2.0
torchaudio
librosa
soundfile

Authors

Muhammad Haris Waqar, Tahir Ahmed Khan

Citation

@misc{quartznet_quranic_asr_2026,
  title={QuartzNet15x5 for Quranic Arabic Speech Recognition},
  author={Muhammad Haris Waqar and Tahir Ahmed Khan},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/9DTechnologies/QuartzNet_quran_v1}}
}

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Downloads last month: 41

Evaluation results

Word Error Rate (Validation) on Qaida Quranic Arabic Test Set
self-reported

0.113