FaisaI/tadabur
Updated • 11.2k • 15
How to use FaisaI/tadabur-Whisper-Small with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="FaisaI/tadabur-Whisper-Small") # Load model directly
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
processor = AutoProcessor.from_pretrained("FaisaI/tadabur-Whisper-Small")
model = AutoModelForSpeechSeq2Seq.from_pretrained("FaisaI/tadabur-Whisper-Small")Tadabur-Whisper-Small is a fine-tuned version of Whisper Small on the Tadabur dataset, as presented in the paper Tadabur: A Large-Scale Quran Audio Dataset.
| Step | Epoch | WER ↓ |
|---|---|---|
| 2,500 | 0.15 | 13.78% |
| 5,000 | 0.30 | 11.20% |
| 7,500 | 0.44 | 11.15% |
| 25,000 | 1.48 | 7.89% ⭐ |
| 32,500 | 1.93 | 14.75% |
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="FaisaI/tadabur-whisper-small",
generate_kwargs={"language": "arabic"}
)
result = asr("path/to/audiofile")
print(result["text"])
Or with the full Whisper API:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa
processor = WhisperProcessor.from_pretrained("FaisaI/tadabur-whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("FaisaI/tadabur-whisper-small")
# Audio must be 16kHz mono
audio_array, sampling_rate = librosa.load("path/to/audiofile", sr=16000,mono=True)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
predicted_ids = model.generate(**inputs, language="arabic")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
This model is trained exclusively on Qur'anic recitation data. Users must engage with outputs respectfully and must not use this model for mockery, distortion, or any disrespectful application involving Qur'anic content.
For research and educational use only.
@misc{alherran2026tadabur,
author = {Alherran, Faisal},
title = {Tadabur: A Large-Scale Quran Audio Dataset},
year = {2026},
eprint = {2604.18932},
archivePrefix = {arXiv},
primaryClass = {cs.SD},
doi = {10.48550/arXiv.2604.18932},
url = {https://arxiv.org/abs/2604.18932}
}
Base model
openai/whisper-small