Quranic Recitation ASR β CTranslate2 Models
Pre-converted CTranslate2 (faster-whisper) models for Quranic recitation transcription, used in the Quranic Recitation Error Detection Pipeline.
Given audio of a Quranic verse, these models produce Arabic transcripts used downstream for error detection β substitutions, deletions, insertions, harakat errors, and Tajweed violations (medd, idgham, ikhfa, ghunna, qalqala, iqlab, izhar, tafkheem).
Models Included
whisper-quran-ct2/ β Recommended for CPU / production
| Property | Value |
|---|---|
| Source model | tarteel-ai/whisper-base-ar-quran |
| Architecture | Whisper Base (~74M parameters) |
| Quantization | int8 (CTranslate2) |
| Size | ~73 MB |
| Reported WER | ~15% (model card) |
| Speed (CPU, 10s audio) | ~1β2 s |
| Memory | ~150 MB |
Fine-tuned Whisper Base specialised for Quranic Arabic. Fast enough for CPU deployment and production use. Default backend in the pipeline.
whisper-quran-v1-ct2/ β High accuracy (use HuggingFace backend)
| Property | Value |
|---|---|
| Source model | wasimlhr/whisper-quran-v1 |
| Architecture | Whisper Large-v3 (~1.55B parameters) |
| Quantization | int8 (CTranslate2) |
| Size | ~2.9 GB |
| Reported WER | ~5.35% (model card) |
| Speed (CPU, 10s audio) | ~15β20 s |
| Memory | ~3 GB |
Note: int8 CTranslate2 conversion of this large fine-tuned model degrades transcription quality. For best results, use the original HuggingFace model directly with
--backend huggingface --model wasimlhr/whisper-quran-v1. This CT2 version is included for reference and speed experiments only.
Usage
With faster-whisper directly
from faster_whisper import WhisperModel
model = WhisperModel("kaylazima/quranic-model/whisper-quran-ct2", device="cpu", compute_type="int8")
segments, _ = model.transcribe("recitation.wav", language="ar", word_timestamps=True)
for seg in segments:
print(seg.text)
With the Quranic Pipeline
# Clone pipeline
git clone <repo-url> && cd quranic-pipeline
# Run with pre-downloaded CT2 model
python scripts/run_pipeline.py \
--audio recitation.wav \
--surah 1 --ayah 1 \
--backend faster-whisper \
--model_dir models/whisper-quran-ct2/
Docker
docker compose run pipeline --surah 1 --ayah 1 --audio data/samples/mock.wav --verbose
Benchmark Results
Evaluated on Buraaq/quran-md-ayahs (Surah 37, ayahs 78β87, Alafasy reciter, 10 samples). Ground-truth WER = 0 (professional reciter); observed WER reflects ASR hallucination rate.
| Model | Backend | Mean WER | Word-level F1 | Avg time/ayah |
|---|---|---|---|---|
| whisper-quran-ct2 (tarteel-ai base) | faster-whisper int8 | 0.613 | 0.786 | ~5.3 s (CPU) |
| wasimlhr HuggingFace original | HF float32 | 0.020 | 0.977 | ~18.6 s (CPU) |
tarteel-ai hallucinates tail phrases on short ayahs; wasimlhr (HF backend) achieves near-perfect transcription with one minor hamza normalisation difference.
Model Conversion
Models were converted using ct2-transformers-converter:
ct2-transformers-converter \
--model tarteel-ai/whisper-base-ar-quran \
--output_dir whisper-quran-ct2 \
--quantization int8
If conversion fails with a
dtypekwarg error (ctranslate2 β₯4.4), a monkey-patch workaround is documented in the pipeline repository.
License
Derived from source models; see original model cards for license terms: