Add model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,122 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: ar
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- whisper
|
| 6 |
+
- arabic
|
| 7 |
+
- quran
|
| 8 |
+
- ctranslate2
|
| 9 |
+
- faster-whisper
|
| 10 |
+
- speech-recognition
|
| 11 |
+
- tajweed
|
| 12 |
+
pipeline_tag: automatic-speech-recognition
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# Quranic Recitation ASR — CTranslate2 Models
|
| 16 |
+
|
| 17 |
+
Pre-converted [CTranslate2](https://github.com/OpenNMT/CTranslate2) (faster-whisper) models for Quranic recitation transcription, used in the **Quranic Recitation Error Detection Pipeline**.
|
| 18 |
+
|
| 19 |
+
Given audio of a Quranic verse, these models produce Arabic transcripts used downstream for error detection — substitutions, deletions, insertions, harakat errors, and Tajweed violations (medd, idgham, ikhfa, ghunna, qalqala, iqlab, izhar, tafkheem).
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## Models Included
|
| 24 |
+
|
| 25 |
+
### `whisper-quran-ct2/` — Recommended for CPU / production
|
| 26 |
+
|
| 27 |
+
| Property | Value |
|
| 28 |
+
|----------|-------|
|
| 29 |
+
| Source model | [`tarteel-ai/whisper-base-ar-quran`](https://huggingface.co/tarteel-ai/whisper-base-ar-quran) |
|
| 30 |
+
| Architecture | Whisper Base (~74M parameters) |
|
| 31 |
+
| Quantization | int8 (CTranslate2) |
|
| 32 |
+
| Size | ~73 MB |
|
| 33 |
+
| Reported WER | ~15% (model card) |
|
| 34 |
+
| Speed (CPU, 10s audio) | ~1–2 s |
|
| 35 |
+
| Memory | ~150 MB |
|
| 36 |
+
|
| 37 |
+
Fine-tuned Whisper Base specialised for Quranic Arabic. Fast enough for CPU deployment and production use. Default backend in the pipeline.
|
| 38 |
+
|
| 39 |
+
### `whisper-quran-v1-ct2/` — High accuracy (use HuggingFace backend)
|
| 40 |
+
|
| 41 |
+
| Property | Value |
|
| 42 |
+
|----------|-------|
|
| 43 |
+
| Source model | [`wasimlhr/whisper-quran-v1`](https://huggingface.co/wasimlhr/whisper-quran-v1) |
|
| 44 |
+
| Architecture | Whisper Large-v3 (~1.55B parameters) |
|
| 45 |
+
| Quantization | int8 (CTranslate2) |
|
| 46 |
+
| Size | ~2.9 GB |
|
| 47 |
+
| Reported WER | ~5.35% (model card) |
|
| 48 |
+
| Speed (CPU, 10s audio) | ~15–20 s |
|
| 49 |
+
| Memory | ~3 GB |
|
| 50 |
+
|
| 51 |
+
> **Note:** int8 CTranslate2 conversion of this large fine-tuned model degrades transcription quality. For best results, use the original HuggingFace model directly with `--backend huggingface --model wasimlhr/whisper-quran-v1`. This CT2 version is included for reference and speed experiments only.
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## Usage
|
| 56 |
+
|
| 57 |
+
### With faster-whisper directly
|
| 58 |
+
|
| 59 |
+
```python
|
| 60 |
+
from faster_whisper import WhisperModel
|
| 61 |
+
|
| 62 |
+
model = WhisperModel("kaylazima/quranic-model/whisper-quran-ct2", device="cpu", compute_type="int8")
|
| 63 |
+
segments, _ = model.transcribe("recitation.wav", language="ar", word_timestamps=True)
|
| 64 |
+
for seg in segments:
|
| 65 |
+
print(seg.text)
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
### With the Quranic Pipeline
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
# Clone pipeline
|
| 72 |
+
git clone <repo-url> && cd quranic-pipeline
|
| 73 |
+
|
| 74 |
+
# Run with pre-downloaded CT2 model
|
| 75 |
+
python scripts/run_pipeline.py \
|
| 76 |
+
--audio recitation.wav \
|
| 77 |
+
--surah 1 --ayah 1 \
|
| 78 |
+
--backend faster-whisper \
|
| 79 |
+
--model_dir models/whisper-quran-ct2/
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### Docker
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
docker compose run pipeline --surah 1 --ayah 1 --audio data/samples/mock.wav --verbose
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
---
|
| 89 |
+
|
| 90 |
+
## Benchmark Results
|
| 91 |
+
|
| 92 |
+
Evaluated on **Buraaq/quran-md-ayahs** (Surah 37, ayahs 78–87, Alafasy reciter, 10 samples). Ground-truth WER = 0 (professional reciter); observed WER reflects ASR hallucination rate.
|
| 93 |
+
|
| 94 |
+
| Model | Backend | Mean WER | Word-level F1 | Avg time/ayah |
|
| 95 |
+
|-------|---------|----------|---------------|---------------|
|
| 96 |
+
| whisper-quran-ct2 (tarteel-ai base) | faster-whisper int8 | 0.613 | 0.786 | ~5.3 s (CPU) |
|
| 97 |
+
| wasimlhr HuggingFace original | HF float32 | 0.020 | 0.977 | ~18.6 s (CPU) |
|
| 98 |
+
|
| 99 |
+
tarteel-ai hallucinates tail phrases on short ayahs; wasimlhr (HF backend) achieves near-perfect transcription with one minor hamza normalisation difference.
|
| 100 |
+
|
| 101 |
+
---
|
| 102 |
+
|
| 103 |
+
## Model Conversion
|
| 104 |
+
|
| 105 |
+
Models were converted using `ct2-transformers-converter`:
|
| 106 |
+
|
| 107 |
+
```bash
|
| 108 |
+
ct2-transformers-converter \
|
| 109 |
+
--model tarteel-ai/whisper-base-ar-quran \
|
| 110 |
+
--output_dir whisper-quran-ct2 \
|
| 111 |
+
--quantization int8
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
> If conversion fails with a `dtype` kwarg error (ctranslate2 ≥4.4), a monkey-patch workaround is documented in the pipeline repository.
|
| 115 |
+
|
| 116 |
+
---
|
| 117 |
+
|
| 118 |
+
## License
|
| 119 |
+
|
| 120 |
+
Derived from source models; see original model cards for license terms:
|
| 121 |
+
- [tarteel-ai/whisper-base-ar-quran](https://huggingface.co/tarteel-ai/whisper-base-ar-quran)
|
| 122 |
+
- [wasimlhr/whisper-quran-v1](https://huggingface.co/wasimlhr/whisper-quran-v1)
|