Whisper Small Fine-Tuned for Egyptian Arabic ASR
This model is a fine-tuned version of OpenAI Whisper Small designed to improve Automatic Speech Recognition (ASR) performance on Egyptian Arabic speech.
The model was trained on Egyptian Arabic speech data to better capture dialectal pronunciation, vocabulary, and speech patterns that are not well represented in the base multilingual Whisper model.
Model Details
| Property | Value |
|---|---|
| Base Model | openai/whisper-small |
| Task | Automatic Speech Recognition (Speech-to-Text) |
| Language | Arabic (Egyptian Dialect) |
| Framework | Hugging Face Transformers |
| Architecture | Transformer Encoder-Decoder (Whisper) |
Training Overview
The model was fine-tuned on a dataset containing Egyptian Arabic speech and corresponding transcripts.
Training pipeline included:
- Audio resampling to 16kHz
- Text normalization
- Filtering invalid samples
- Whisper processor feature extraction
- Fine-tuning using Hugging Face Transformers Trainer
- GPU training with PyTorch
The training process aimed to improve transcription accuracy on dialectal Egyptian Arabic speech.
Intended Use
This model is intended for:
- Egyptian Arabic speech transcription
- Arabic speech recognition research
- Voice AI applications
- Speech dataset analysis
- ASR benchmarking for Arabic dialects
Example use cases include:
- Voice assistants
- Transcription tools
- Speech dataset labeling
- Arabic voice interfaces
Limitations
- The model is specialized for Egyptian Arabic and may perform worse on other Arabic dialects.
- Performance may degrade on:
- Noisy audio
- Overlapping speech
- Low-quality recordings
- Like most ASR systems, rare words or names may be misrecognized.
Example Usage
import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration
model_id = "itshamdi404/Egy_Arabic_whisper-small"
processor = WhisperProcessor.from_pretrained(model_id, language="ar", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained(model_id)
audio, sr = librosa.load("test.wav", sr=16000)
inputs = processor(
audio,
sampling_rate=16000,
return_tensors="pt"
)
with torch.no_grad():
predicted_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
Model Evaluation
The model was evaluated using standard ASR metrics:
| Metric | Description |
|---|---|
| WER | Word Error Rate |
| CER | Character Error Rate |
Evaluation was performed on unseen Egyptian Arabic speech samples.
Dataset
The model was fine-tuned using Egyptian Arabic speech data consisting of paired:
- Audio recordings
- Text transcripts
The dataset includes a variety of speakers and natural spoken language.
Future Improvements
Possible future improvements include:
- Training on larger Egyptian Arabic datasets
- Adding more speaker diversity
- Improving robustness to noisy environments
- Evaluating across multiple Arabic dialects
Author
Hamdi Mohamed — AI Engineer specializing in:
- Large Language Models (LLMs)
- Speech AI
- Computer Vision
Citation
If you use this model in your research or project, please cite:
@misc{hamdi2026whisper,
author = {Hamdi Mohamed},
title = {Whisper Small Egyptian Arabic ASR Model},
year = {2026},
url = {https://huggingface.co/itshamdi404}
}
- Downloads last month
- 202
Model tree for itshamdi404/Egy_Arabic_whisper-small
Base model
openai/whisper-small