Spaces:

Quran-Lab
/

README

Running

App Files Files Community

README / README.md

Muno459

Make org card title high contrast

3629643 verified 11 days ago

preview code

Raw

History Blame Contribute Delete

4.33 kB

metadata

colorFrom: green
colorTo: blue
sdk: static
pinned: false

Quran Lab
Quranic ASR benchmarks, raw cloud results, and reproducible Arabic speech evaluation.

Quranic ASR Benchmark · Cloud ASR Raw Data · Leaderboard

600 benchmark clips · 2.14 hours · 6 provider runs · raw responses, hypotheses, scripts, and scores

Mission

Quran Lab builds high-quality public resources for evaluating and improving ASR on Quranic recitation. Our focus is simple: leakage-free benchmarks, transparent scoring, reproducible artifacts, and practical tools for researchers working on Arabic and Quranic speech.

We care about evaluations that measure generalization rather than memorization, especially on real recitation audio and real-world recording conditions.

At A Glance

Focus	Current Work
Benchmarking	Leakage-free Quranic ASR test set with 600 clips across studio, held-out reciter, and real phone-mic sources.
Evaluation	Official WER/CER scorer with Quranic normalization and alef-insensitive reporting.
Reproducibility	Published hypotheses, raw cloud responses, score files, scripts, and run notes.
Applied ASR	Provider comparisons for Tarteel, Google Chirp 3, ElevenLabs, Deepgram, and Speechmatics.

Featured Resources

Resource	Type	Description
quranic-asr-benchmark	Dataset	600-clip leakage-free Quranic ASR benchmark with held-out reciters and real phone recordings.
quranic-asr-cloud-rawdata	Dataset	Raw cloud/provider ASR outputs, normalized hypotheses, score tables, scripts, and reproducibility artifacts.
quranic-asr-leaderboard	Space	Live leaderboard for comparing ASR systems on the same benchmark and scorer.

Current Benchmark Snapshot

System	Overall WER	Notes
Tarteel official	10.99	Official realtime websocket path
Google Chirp 3 sync	11.92	Speech-to-Text v2, `chirp_3`, location `us`
Google Chirp 3 realtime	13.56	StreamingRecognize with realtime pacing
ElevenLabs Scribe v2	14.05	Arabic language hint
Deepgram nova-3	15.79	Arabic language hint
Speechmatics enhanced	21.06	Enhanced operating point

Scores are produced with the official scorer from quranic-asr-benchmark and are reported as WER/CER with an additional alef-insensitive metric for Quranic orthography differences.

What We Value

Leakage-aware evaluation: held-out reciters and clips absent from training data.
Reproducibility: raw responses, hypotheses, scripts, and score files are published when possible.
Responsible data handling: source audio access and redistribution rules are kept with the source benchmark.
Quranic Arabic focus: scoring and reporting account for Quranic orthography and recitation-specific challenges.

Repository Layout

Category	Status
Public datasets	Benchmark data and cloud ASR outputs are available.
Public spaces	Leaderboard is available.
Public models	No public models yet.

Citation

If you use Quran Lab resources, please cite the specific dataset or Space you used. Dataset cards include citation metadata and usage notes.

Contact

For benchmark questions, corrections, rights-holder requests, or collaboration ideas, open a discussion on the relevant dataset or Space.