README / README.md
Muno459's picture
Make org card title high contrast
3629643 verified
|
Raw
History Blame Contribute Delete
4.33 kB
metadata
colorFrom: green
colorTo: blue
sdk: static
pinned: false
Quran Lab - Quranic ASR infrastructure

Quran Lab
Quranic ASR benchmarks, raw cloud results, and reproducible Arabic speech evaluation.

Quranic ASR BenchmarkCloud ASR Raw DataLeaderboard

600 benchmark clips2.14 hours6 provider runsraw responses, hypotheses, scripts, and scores

Mission

Quran Lab builds high-quality public resources for evaluating and improving ASR on Quranic recitation. Our focus is simple: leakage-free benchmarks, transparent scoring, reproducible artifacts, and practical tools for researchers working on Arabic and Quranic speech.

We care about evaluations that measure generalization rather than memorization, especially on real recitation audio and real-world recording conditions.

At A Glance

Focus Current Work
Benchmarking Leakage-free Quranic ASR test set with 600 clips across studio, held-out reciter, and real phone-mic sources.
Evaluation Official WER/CER scorer with Quranic normalization and alef-insensitive reporting.
Reproducibility Published hypotheses, raw cloud responses, score files, scripts, and run notes.
Applied ASR Provider comparisons for Tarteel, Google Chirp 3, ElevenLabs, Deepgram, and Speechmatics.

Featured Resources

Resource Type Description
quranic-asr-benchmark Dataset 600-clip leakage-free Quranic ASR benchmark with held-out reciters and real phone recordings.
quranic-asr-cloud-rawdata Dataset Raw cloud/provider ASR outputs, normalized hypotheses, score tables, scripts, and reproducibility artifacts.
quranic-asr-leaderboard Space Live leaderboard for comparing ASR systems on the same benchmark and scorer.

Current Benchmark Snapshot

System Overall WER Notes
Tarteel official 10.99 Official realtime websocket path
Google Chirp 3 sync 11.92 Speech-to-Text v2, chirp_3, location us
Google Chirp 3 realtime 13.56 StreamingRecognize with realtime pacing
ElevenLabs Scribe v2 14.05 Arabic language hint
Deepgram nova-3 15.79 Arabic language hint
Speechmatics enhanced 21.06 Enhanced operating point

Scores are produced with the official scorer from quranic-asr-benchmark and are reported as WER/CER with an additional alef-insensitive metric for Quranic orthography differences.

What We Value

  • Leakage-aware evaluation: held-out reciters and clips absent from training data.
  • Reproducibility: raw responses, hypotheses, scripts, and score files are published when possible.
  • Responsible data handling: source audio access and redistribution rules are kept with the source benchmark.
  • Quranic Arabic focus: scoring and reporting account for Quranic orthography and recitation-specific challenges.

Repository Layout

Category Status
Public datasets Benchmark data and cloud ASR outputs are available.
Public spaces Leaderboard is available.
Public models No public models yet.

Citation

If you use Quran Lab resources, please cite the specific dataset or Space you used. Dataset cards include citation metadata and usage notes.

Contact

For benchmark questions, corrections, rights-holder requests, or collaboration ideas, open a discussion on the relevant dataset or Space.