| --- |
| colorFrom: green |
| colorTo: blue |
| sdk: static |
| pinned: false |
| --- |
| |
| <div align="center"> |
|
|
| <img src="https://huggingface.co/spaces/Quran-Lab/README/resolve/main/assets/quran-lab-banner.svg" alt="Quran Lab - Quranic ASR infrastructure" width="100%" /> |
|
|
| <table> |
| <tr> |
| <td align="center" bgcolor="#003f37"> |
| <br /> |
| <font color="#f8e7b4" size="7"><strong>Quran Lab</strong></font> |
| <br /> |
| <font color="#fff8df"><strong>Quranic ASR benchmarks, raw cloud results, and reproducible Arabic speech evaluation.</strong></font> |
| <br /><br /> |
| </td> |
| </tr> |
| </table> |
| |
| <img src="https://huggingface.co/spaces/Quran-Lab/README/resolve/main/assets/quran-lab-wordmark.svg" alt="" width="360" /> |
|
|
| [Quranic ASR Benchmark](https://huggingface.co/datasets/Quran-Lab/quranic-asr-benchmark) 路 [Cloud ASR Raw Data](https://huggingface.co/datasets/Quran-Lab/quranic-asr-cloud-rawdata) 路 [Leaderboard](https://huggingface.co/spaces/Quran-Lab/quranic-asr-leaderboard) |
|
|
| **600 benchmark clips** 路 **2.14 hours** 路 **6 provider runs** 路 **raw responses, hypotheses, scripts, and scores** |
|
|
| </div> |
|
|
| ## Mission |
|
|
| Quran Lab builds high-quality public resources for evaluating and improving ASR on Quranic recitation. Our focus is simple: leakage-free benchmarks, transparent scoring, reproducible artifacts, and practical tools for researchers working on Arabic and Quranic speech. |
|
|
| We care about evaluations that measure generalization rather than memorization, especially on real recitation audio and real-world recording conditions. |
|
|
| ## At A Glance |
|
|
| | Focus | Current Work | |
| | --- | --- | |
| | Benchmarking | Leakage-free Quranic ASR test set with 600 clips across studio, held-out reciter, and real phone-mic sources. | |
| | Evaluation | Official WER/CER scorer with Quranic normalization and alef-insensitive reporting. | |
| | Reproducibility | Published hypotheses, raw cloud responses, score files, scripts, and run notes. | |
| | Applied ASR | Provider comparisons for Tarteel, Google Chirp 3, ElevenLabs, Deepgram, and Speechmatics. | |
|
|
| ## Featured Resources |
|
|
| | Resource | Type | Description | |
| | --- | --- | --- | |
| | [quranic-asr-benchmark](https://huggingface.co/datasets/Quran-Lab/quranic-asr-benchmark) | Dataset | 600-clip leakage-free Quranic ASR benchmark with held-out reciters and real phone recordings. | |
| | [quranic-asr-cloud-rawdata](https://huggingface.co/datasets/Quran-Lab/quranic-asr-cloud-rawdata) | Dataset | Raw cloud/provider ASR outputs, normalized hypotheses, score tables, scripts, and reproducibility artifacts. | |
| | [quranic-asr-leaderboard](https://huggingface.co/spaces/Quran-Lab/quranic-asr-leaderboard) | Space | Live leaderboard for comparing ASR systems on the same benchmark and scorer. | |
|
|
| ## Current Benchmark Snapshot |
|
|
| | System | Overall WER | Notes | |
| | --- | ---: | --- | |
| | Tarteel official | 10.99 | Official realtime websocket path | |
| | Google Chirp 3 sync | 11.92 | Speech-to-Text v2, `chirp_3`, location `us` | |
| | Google Chirp 3 realtime | 13.56 | StreamingRecognize with realtime pacing | |
| | ElevenLabs Scribe v2 | 14.05 | Arabic language hint | |
| | Deepgram nova-3 | 15.79 | Arabic language hint | |
| | Speechmatics enhanced | 21.06 | Enhanced operating point | |
|
|
| Scores are produced with the official scorer from `quranic-asr-benchmark` and are reported as WER/CER with an additional alef-insensitive metric for Quranic orthography differences. |
|
|
| ## What We Value |
|
|
| - Leakage-aware evaluation: held-out reciters and clips absent from training data. |
| - Reproducibility: raw responses, hypotheses, scripts, and score files are published when possible. |
| - Responsible data handling: source audio access and redistribution rules are kept with the source benchmark. |
| - Quranic Arabic focus: scoring and reporting account for Quranic orthography and recitation-specific challenges. |
|
|
| ## Repository Layout |
|
|
| | Category | Status | |
| | --- | --- | |
| | Public datasets | Benchmark data and cloud ASR outputs are available. | |
| | Public spaces | Leaderboard is available. | |
| | Public models | No public models yet. | |
|
|
| ## Citation |
|
|
| If you use Quran Lab resources, please cite the specific dataset or Space you used. Dataset cards include citation metadata and usage notes. |
|
|
| ## Contact |
|
|
| For benchmark questions, corrections, rights-holder requests, or collaboration ideas, open a discussion on the relevant dataset or Space. |
|
|