kaylazima commited on
Commit
325edee
·
verified ·
1 Parent(s): 481fa40

Add model card

Browse files
Files changed (1) hide show
  1. README.md +122 -0
README.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ar
3
+ license: mit
4
+ tags:
5
+ - whisper
6
+ - arabic
7
+ - quran
8
+ - ctranslate2
9
+ - faster-whisper
10
+ - speech-recognition
11
+ - tajweed
12
+ pipeline_tag: automatic-speech-recognition
13
+ ---
14
+
15
+ # Quranic Recitation ASR — CTranslate2 Models
16
+
17
+ Pre-converted [CTranslate2](https://github.com/OpenNMT/CTranslate2) (faster-whisper) models for Quranic recitation transcription, used in the **Quranic Recitation Error Detection Pipeline**.
18
+
19
+ Given audio of a Quranic verse, these models produce Arabic transcripts used downstream for error detection — substitutions, deletions, insertions, harakat errors, and Tajweed violations (medd, idgham, ikhfa, ghunna, qalqala, iqlab, izhar, tafkheem).
20
+
21
+ ---
22
+
23
+ ## Models Included
24
+
25
+ ### `whisper-quran-ct2/` — Recommended for CPU / production
26
+
27
+ | Property | Value |
28
+ |----------|-------|
29
+ | Source model | [`tarteel-ai/whisper-base-ar-quran`](https://huggingface.co/tarteel-ai/whisper-base-ar-quran) |
30
+ | Architecture | Whisper Base (~74M parameters) |
31
+ | Quantization | int8 (CTranslate2) |
32
+ | Size | ~73 MB |
33
+ | Reported WER | ~15% (model card) |
34
+ | Speed (CPU, 10s audio) | ~1–2 s |
35
+ | Memory | ~150 MB |
36
+
37
+ Fine-tuned Whisper Base specialised for Quranic Arabic. Fast enough for CPU deployment and production use. Default backend in the pipeline.
38
+
39
+ ### `whisper-quran-v1-ct2/` — High accuracy (use HuggingFace backend)
40
+
41
+ | Property | Value |
42
+ |----------|-------|
43
+ | Source model | [`wasimlhr/whisper-quran-v1`](https://huggingface.co/wasimlhr/whisper-quran-v1) |
44
+ | Architecture | Whisper Large-v3 (~1.55B parameters) |
45
+ | Quantization | int8 (CTranslate2) |
46
+ | Size | ~2.9 GB |
47
+ | Reported WER | ~5.35% (model card) |
48
+ | Speed (CPU, 10s audio) | ~15–20 s |
49
+ | Memory | ~3 GB |
50
+
51
+ > **Note:** int8 CTranslate2 conversion of this large fine-tuned model degrades transcription quality. For best results, use the original HuggingFace model directly with `--backend huggingface --model wasimlhr/whisper-quran-v1`. This CT2 version is included for reference and speed experiments only.
52
+
53
+ ---
54
+
55
+ ## Usage
56
+
57
+ ### With faster-whisper directly
58
+
59
+ ```python
60
+ from faster_whisper import WhisperModel
61
+
62
+ model = WhisperModel("kaylazima/quranic-model/whisper-quran-ct2", device="cpu", compute_type="int8")
63
+ segments, _ = model.transcribe("recitation.wav", language="ar", word_timestamps=True)
64
+ for seg in segments:
65
+ print(seg.text)
66
+ ```
67
+
68
+ ### With the Quranic Pipeline
69
+
70
+ ```bash
71
+ # Clone pipeline
72
+ git clone <repo-url> && cd quranic-pipeline
73
+
74
+ # Run with pre-downloaded CT2 model
75
+ python scripts/run_pipeline.py \
76
+ --audio recitation.wav \
77
+ --surah 1 --ayah 1 \
78
+ --backend faster-whisper \
79
+ --model_dir models/whisper-quran-ct2/
80
+ ```
81
+
82
+ ### Docker
83
+
84
+ ```bash
85
+ docker compose run pipeline --surah 1 --ayah 1 --audio data/samples/mock.wav --verbose
86
+ ```
87
+
88
+ ---
89
+
90
+ ## Benchmark Results
91
+
92
+ Evaluated on **Buraaq/quran-md-ayahs** (Surah 37, ayahs 78–87, Alafasy reciter, 10 samples). Ground-truth WER = 0 (professional reciter); observed WER reflects ASR hallucination rate.
93
+
94
+ | Model | Backend | Mean WER | Word-level F1 | Avg time/ayah |
95
+ |-------|---------|----------|---------------|---------------|
96
+ | whisper-quran-ct2 (tarteel-ai base) | faster-whisper int8 | 0.613 | 0.786 | ~5.3 s (CPU) |
97
+ | wasimlhr HuggingFace original | HF float32 | 0.020 | 0.977 | ~18.6 s (CPU) |
98
+
99
+ tarteel-ai hallucinates tail phrases on short ayahs; wasimlhr (HF backend) achieves near-perfect transcription with one minor hamza normalisation difference.
100
+
101
+ ---
102
+
103
+ ## Model Conversion
104
+
105
+ Models were converted using `ct2-transformers-converter`:
106
+
107
+ ```bash
108
+ ct2-transformers-converter \
109
+ --model tarteel-ai/whisper-base-ar-quran \
110
+ --output_dir whisper-quran-ct2 \
111
+ --quantization int8
112
+ ```
113
+
114
+ > If conversion fails with a `dtype` kwarg error (ctranslate2 ≥4.4), a monkey-patch workaround is documented in the pipeline repository.
115
+
116
+ ---
117
+
118
+ ## License
119
+
120
+ Derived from source models; see original model cards for license terms:
121
+ - [tarteel-ai/whisper-base-ar-quran](https://huggingface.co/tarteel-ai/whisper-base-ar-quran)
122
+ - [wasimlhr/whisper-quran-v1](https://huggingface.co/wasimlhr/whisper-quran-v1)