Whisper Small Fine-Tuned for Egyptian Arabic ASR

This model is a fine-tuned version of OpenAI Whisper Small designed to improve Automatic Speech Recognition (ASR) performance on Egyptian Arabic speech.

The model was trained on Egyptian Arabic speech data to better capture dialectal pronunciation, vocabulary, and speech patterns that are not well represented in the base multilingual Whisper model.

Model Details

Property	Value
Base Model	`openai/whisper-small`
Task	Automatic Speech Recognition (Speech-to-Text)
Language	Arabic (Egyptian Dialect)
Framework	Hugging Face Transformers
Architecture	Transformer Encoder-Decoder (Whisper)

Training Overview

The model was fine-tuned on a dataset containing Egyptian Arabic speech and corresponding transcripts.

Training pipeline included:

Audio resampling to 16kHz
Text normalization
Filtering invalid samples
Whisper processor feature extraction
Fine-tuning using Hugging Face Transformers Trainer
GPU training with PyTorch

The training process aimed to improve transcription accuracy on dialectal Egyptian Arabic speech.

Intended Use

This model is intended for:

Egyptian Arabic speech transcription
Arabic speech recognition research
Voice AI applications
Speech dataset analysis
ASR benchmarking for Arabic dialects

Example use cases include:

Voice assistants
Transcription tools
Speech dataset labeling
Arabic voice interfaces

Limitations

The model is specialized for Egyptian Arabic and may perform worse on other Arabic dialects.
Performance may degrade on:
- Noisy audio
- Overlapping speech
- Low-quality recordings
Like most ASR systems, rare words or names may be misrecognized.

Example Usage

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration

model_id = "itshamdi404/Egy_Arabic_whisper-small"

processor = WhisperProcessor.from_pretrained(model_id, language="ar", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained(model_id)

audio, sr = librosa.load("test.wav", sr=16000)

inputs = processor(
    audio,
    sampling_rate=16000,
    return_tensors="pt"
)

with torch.no_grad():
    predicted_ids = model.generate(inputs["input_features"])

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print(transcription)

Model Evaluation

The model was evaluated using standard ASR metrics:

Metric	Description
WER	Word Error Rate
CER	Character Error Rate

Evaluation was performed on unseen Egyptian Arabic speech samples.

Dataset

The model was fine-tuned using Egyptian Arabic speech data consisting of paired:

Audio recordings
Text transcripts

The dataset includes a variety of speakers and natural spoken language.

Future Improvements

Possible future improvements include:

Training on larger Egyptian Arabic datasets
Adding more speaker diversity
Improving robustness to noisy environments
Evaluating across multiple Arabic dialects

Author

Hamdi Mohamed — AI Engineer specializing in:

Large Language Models (LLMs)
Speech AI
Computer Vision

Citation

If you use this model in your research or project, please cite:

@misc{hamdi2026whisper,
  author = {Hamdi Mohamed},
  title  = {Whisper Small Egyptian Arabic ASR Model},
  year   = {2026},
  url    = {https://huggingface.co/itshamdi404}
}

Downloads last month: 202

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for itshamdi404/Egy_Arabic_whisper-small

Base model

openai/whisper-small

Finetuned

(3438)

this model