stt_arabic_quartznet15x5_v1

Model Overview

Model Type: Automatic Speech Recognition (ASR)
Language: Arabic (Quranic Arabic with diacritics)
Developed by: Muhammad Haris Waqar, Tahir Ahmed Khan

This model is a QuartzNet15x5-based NVIDIA NeMo ASR model, fine-tuned specifically for Quranic recitation and Qaida-based Arabic pronunciation.
It is optimized to handle Arabic phonetics, diacritics (Tashkeel), and religious recitation styles.

Model Architecture

  • Base Architecture: QuartzNet15x5
  • Framework: NVIDIA NeMo
  • Encoder: Time-Channel Separable Convolutions
  • Decoder: CTC
  • Tokenizer: Character-level (Arabic with diacritics)
  • Audio Features: 64-dimensional Mel-spectrogram
  • Sample Rate: 16 kHz

Intended Use

This model is suitable for:

  • Quranic recitation transcription
  • Qaida-based Arabic learning systems
  • Pronunciation evaluation and feedback
  • Educational and religious ASR applications
  • Arabic speech recognition with diacritics

Training Details

Dataset

  • Custom Qaida Quranic Arabic dataset
  • Diacritized Arabic transcriptions
  • Carefully curated religious audio content

Training Configuration

  • Optimizer: Novograd
  • Learning Rate: 0.01 (polynomial decay)
  • Batch Size: 32
  • Precision: FP32
  • Spectrogram Augmentation: Enabled

Best Checkpoint

  • Checkpoint: QuartzNet15x5--val_wer=0.1131-epoch=61.ckpt
  • Epoch: 61
  • Validation WER: 0.1131

Usage

Installation

pip install nemo_toolkit[asr]

Inference Example

import nemo.collections.asr as nemo_asr

# Restore locally
asr_model = nemo_asr.models.ASRModel.restore_from("stt_arabic_quartznet15x5_v1.nemo")

transcription = asr_model.transcribe(["audio.wav"])
print(transcription[0])

Hugging Face Hub Usage

from huggingface_hub import hf_hub_download
import nemo.collections.asr as nemo_asr

model_path = hf_hub_download(
    repo_id="9DTechnologies/QuartzNet_quran_v1",
    filename="stt_arabic_quartznet15x5_v1.nemo"
)

asr_model = nemo_asr.models.ASRModel.restore_from(model_path)

Requirements

  • Python >= 3.8
  • torch >= 2.0
  • nemo_toolkit[asr] >= 2.0
  • torchaudio
  • librosa
  • soundfile

Authors

  • Muhammad Haris Waqar, Tahir Ahmed Khan

Citation

@misc{quartznet_quranic_asr_2026,
  title={QuartzNet15x5 for Quranic Arabic Speech Recognition},
  author={Muhammad Haris Waqar and Tahir Ahmed Khan},
  year={2026},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/9DTechnologies/QuartzNet_quran_v1}}
}

License

Creative Commons Attribution 4.0 International (CC BY 4.0)

Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

  • Word Error Rate (Validation) on Qaida Quranic Arabic Test Set
    self-reported
    0.113