stt_arabic_quartznet15x5_v1
Model Overview
Model Type: Automatic Speech Recognition (ASR)
Language: Arabic (Quranic Arabic with diacritics)
Developed by: Muhammad Haris Waqar, Tahir Ahmed Khan
This model is a QuartzNet15x5-based NVIDIA NeMo ASR model, fine-tuned specifically for Quranic recitation and Qaida-based Arabic pronunciation.
It is optimized to handle Arabic phonetics, diacritics (Tashkeel), and religious recitation styles.
Model Architecture
- Base Architecture: QuartzNet15x5
- Framework: NVIDIA NeMo
- Encoder: Time-Channel Separable Convolutions
- Decoder: CTC
- Tokenizer: Character-level (Arabic with diacritics)
- Audio Features: 64-dimensional Mel-spectrogram
- Sample Rate: 16 kHz
Intended Use
This model is suitable for:
- Quranic recitation transcription
- Qaida-based Arabic learning systems
- Pronunciation evaluation and feedback
- Educational and religious ASR applications
- Arabic speech recognition with diacritics
Training Details
Dataset
- Custom Qaida Quranic Arabic dataset
- Diacritized Arabic transcriptions
- Carefully curated religious audio content
Training Configuration
- Optimizer: Novograd
- Learning Rate: 0.01 (polynomial decay)
- Batch Size: 32
- Precision: FP32
- Spectrogram Augmentation: Enabled
Best Checkpoint
- Checkpoint: QuartzNet15x5--val_wer=0.1131-epoch=61.ckpt
- Epoch: 61
- Validation WER: 0.1131
Usage
Installation
pip install nemo_toolkit[asr]
Inference Example
import nemo.collections.asr as nemo_asr
# Restore locally
asr_model = nemo_asr.models.ASRModel.restore_from("stt_arabic_quartznet15x5_v1.nemo")
transcription = asr_model.transcribe(["audio.wav"])
print(transcription[0])
Hugging Face Hub Usage
from huggingface_hub import hf_hub_download
import nemo.collections.asr as nemo_asr
model_path = hf_hub_download(
repo_id="9DTechnologies/QuartzNet_quran_v1",
filename="stt_arabic_quartznet15x5_v1.nemo"
)
asr_model = nemo_asr.models.ASRModel.restore_from(model_path)
Requirements
- Python >= 3.8
- torch >= 2.0
- nemo_toolkit[asr] >= 2.0
- torchaudio
- librosa
- soundfile
Authors
- Muhammad Haris Waqar, Tahir Ahmed Khan
Citation
@misc{quartznet_quranic_asr_2026,
title={QuartzNet15x5 for Quranic Arabic Speech Recognition},
author={Muhammad Haris Waqar and Tahir Ahmed Khan},
year={2026},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/9DTechnologies/QuartzNet_quran_v1}}
}
License
Creative Commons Attribution 4.0 International (CC BY 4.0)
- Downloads last month
- 41
Evaluation results
- Word Error Rate (Validation) on Qaida Quranic Arabic Test Setself-reported0.113