Quranic Recitation ASR β€” CTranslate2 Models

Pre-converted CTranslate2 (faster-whisper) models for Quranic recitation transcription, used in the Quranic Recitation Error Detection Pipeline.

Given audio of a Quranic verse, these models produce Arabic transcripts used downstream for error detection β€” substitutions, deletions, insertions, harakat errors, and Tajweed violations (medd, idgham, ikhfa, ghunna, qalqala, iqlab, izhar, tafkheem).


Models Included

whisper-quran-ct2/ β€” Recommended for CPU / production

Property Value
Source model tarteel-ai/whisper-base-ar-quran
Architecture Whisper Base (~74M parameters)
Quantization int8 (CTranslate2)
Size ~73 MB
Reported WER ~15% (model card)
Speed (CPU, 10s audio) ~1–2 s
Memory ~150 MB

Fine-tuned Whisper Base specialised for Quranic Arabic. Fast enough for CPU deployment and production use. Default backend in the pipeline.

whisper-quran-v1-ct2/ β€” High accuracy (use HuggingFace backend)

Property Value
Source model wasimlhr/whisper-quran-v1
Architecture Whisper Large-v3 (~1.55B parameters)
Quantization int8 (CTranslate2)
Size ~2.9 GB
Reported WER ~5.35% (model card)
Speed (CPU, 10s audio) ~15–20 s
Memory ~3 GB

Note: int8 CTranslate2 conversion of this large fine-tuned model degrades transcription quality. For best results, use the original HuggingFace model directly with --backend huggingface --model wasimlhr/whisper-quran-v1. This CT2 version is included for reference and speed experiments only.


Usage

With faster-whisper directly

from faster_whisper import WhisperModel

model = WhisperModel("kaylazima/quranic-model/whisper-quran-ct2", device="cpu", compute_type="int8")
segments, _ = model.transcribe("recitation.wav", language="ar", word_timestamps=True)
for seg in segments:
    print(seg.text)

With the Quranic Pipeline

# Clone pipeline
git clone <repo-url> && cd quranic-pipeline

# Run with pre-downloaded CT2 model
python scripts/run_pipeline.py \
    --audio recitation.wav \
    --surah 1 --ayah 1 \
    --backend faster-whisper \
    --model_dir models/whisper-quran-ct2/

Docker

docker compose run pipeline --surah 1 --ayah 1 --audio data/samples/mock.wav --verbose

Benchmark Results

Evaluated on Buraaq/quran-md-ayahs (Surah 37, ayahs 78–87, Alafasy reciter, 10 samples). Ground-truth WER = 0 (professional reciter); observed WER reflects ASR hallucination rate.

Model Backend Mean WER Word-level F1 Avg time/ayah
whisper-quran-ct2 (tarteel-ai base) faster-whisper int8 0.613 0.786 ~5.3 s (CPU)
wasimlhr HuggingFace original HF float32 0.020 0.977 ~18.6 s (CPU)

tarteel-ai hallucinates tail phrases on short ayahs; wasimlhr (HF backend) achieves near-perfect transcription with one minor hamza normalisation difference.


Model Conversion

Models were converted using ct2-transformers-converter:

ct2-transformers-converter \
    --model tarteel-ai/whisper-base-ar-quran \
    --output_dir whisper-quran-ct2 \
    --quantization int8

If conversion fails with a dtype kwarg error (ctranslate2 β‰₯4.4), a monkey-patch workaround is documented in the pipeline repository.


License

Derived from source models; see original model cards for license terms:

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support