Quranic Recitation ASR — CTranslate2 Models

Pre-converted CTranslate2 (faster-whisper) models for Quranic recitation transcription, used in the Quranic Recitation Error Detection Pipeline.

Given audio of a Quranic verse, these models produce Arabic transcripts used downstream for error detection — substitutions, deletions, insertions, harakat errors, and Tajweed violations (medd, idgham, ikhfa, ghunna, qalqala, iqlab, izhar, tafkheem).

Models Included

`whisper-quran-ct2/` — Recommended for CPU / production

Property	Value
Source model	`tarteel-ai/whisper-base-ar-quran`
Architecture	Whisper Base (~74M parameters)
Quantization	int8 (CTranslate2)
Size	~73 MB
Reported WER	~15% (model card)
Speed (CPU, 10s audio)	~1–2 s
Memory	~150 MB

Fine-tuned Whisper Base specialised for Quranic Arabic. Fast enough for CPU deployment and production use. Default backend in the pipeline.

`whisper-quran-v1-ct2/` — High accuracy (use HuggingFace backend)

Property	Value
Source model	`wasimlhr/whisper-quran-v1`
Architecture	Whisper Large-v3 (~1.55B parameters)
Quantization	int8 (CTranslate2)
Size	~2.9 GB
Reported WER	~5.35% (model card)
Speed (CPU, 10s audio)	~15–20 s
Memory	~3 GB

Note: int8 CTranslate2 conversion of this large fine-tuned model degrades transcription quality. For best results, use the original HuggingFace model directly with --backend huggingface --model wasimlhr/whisper-quran-v1. This CT2 version is included for reference and speed experiments only.

Usage

With faster-whisper directly

from faster_whisper import WhisperModel

model = WhisperModel("kaylazima/quranic-model/whisper-quran-ct2", device="cpu", compute_type="int8")
segments, _ = model.transcribe("recitation.wav", language="ar", word_timestamps=True)
for seg in segments:
    print(seg.text)

With the Quranic Pipeline

# Clone pipeline
git clone <repo-url> && cd quranic-pipeline

# Run with pre-downloaded CT2 model
python scripts/run_pipeline.py \
    --audio recitation.wav \
    --surah 1 --ayah 1 \
    --backend faster-whisper \
    --model_dir models/whisper-quran-ct2/

Docker

docker compose run pipeline --surah 1 --ayah 1 --audio data/samples/mock.wav --verbose

Benchmark Results

Evaluated on Buraaq/quran-md-ayahs (Surah 37, ayahs 78–87, Alafasy reciter, 10 samples). Ground-truth WER = 0 (professional reciter); observed WER reflects ASR hallucination rate.

Model	Backend	Mean WER	Word-level F1	Avg time/ayah
whisper-quran-ct2 (tarteel-ai base)	faster-whisper int8	0.613	0.786	~5.3 s (CPU)
wasimlhr HuggingFace original	HF float32	0.020	0.977	~18.6 s (CPU)

tarteel-ai hallucinates tail phrases on short ayahs; wasimlhr (HF backend) achieves near-perfect transcription with one minor hamza normalisation difference.

Model Conversion

Models were converted using ct2-transformers-converter:

ct2-transformers-converter \
    --model tarteel-ai/whisper-base-ar-quran \
    --output_dir whisper-quran-ct2 \
    --quantization int8

If conversion fails with a dtype kwarg error (ctranslate2 ≥4.4), a monkey-patch workaround is documented in the pipeline repository.

License

Derived from source models; see original model cards for license terms:

Downloads last month: -; Downloads are not tracked for this model. How to track