Tadabur-Whisper-Small

A Whisper Small model fine-tuned on Tadabur for Qur'anic speech recognition.

Dataset Base Model License Page


Overview

Tadabur-Whisper-Small is fine-tuned on the Tadabur dataset


Training Iteration

Step Epoch WER ↓
2,500 0.15 13.78%
5,000 0.30 11.20%
7,500 0.44 11.15%
25,000 1.48 7.89%
32,500 1.93 14.75%

Usage

from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model="FaisaI/tadabur-whisper-small",
    generate_kwargs={"language": "arabic"}
)

result = asr("path/to/audiofile")
print(result["text"])

Or with the full Whisper API:

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

processor = WhisperProcessor.from_pretrained("FaisaI/tadabur-whisper-small")
model = WhisperForConditionalGeneration.from_pretrained("FaisaI/tadabur-whisper-small")



# Audio must be 16kHz mono
audio_array, sampling_rate = librosa.load("path/to/audiofile", sr=16000,mono=True)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")

predicted_ids = model.generate(**inputs, language="arabic")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Limitations

  • Not suitable for speaker identification or diarization.
  • May underperform on noisy or low-quality recordings.
  • Not fully generalized — transcription errors are expected.

Ethical Considerations

This model is trained exclusively on Qur'anic recitation data. Users must engage with outputs respectfully and must not use this model for mockery, distortion, or any disrespectful application involving Qur'anic content.

For research and educational use only.


Citation

@misc{alherran2026tadabur,
  author = {Alherran, Faisal},
  title  = {Tadabur: A Large-Scale Quran Audio Dataset},
  year   = {2026},
  url    = {https://github.com/fherran/tadabur}
}
Downloads last month
31
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FaisaI/tadabur-Whisper-Small

Finetuned
(3342)
this model

Dataset used to train FaisaI/tadabur-Whisper-Small