Whisper Small Fine-Tuned for Egyptian Arabic ASR

This model is a fine-tuned version of OpenAI Whisper Small designed to improve Automatic Speech Recognition (ASR) performance on Egyptian Arabic speech.

The model was trained on Egyptian Arabic speech data to better capture dialectal pronunciation, vocabulary, and speech patterns that are not well represented in the base multilingual Whisper model.


Model Details

Property Value
Base Model openai/whisper-small
Task Automatic Speech Recognition (Speech-to-Text)
Language Arabic (Egyptian Dialect)
Framework Hugging Face Transformers
Architecture Transformer Encoder-Decoder (Whisper)

Training Overview

The model was fine-tuned on a dataset containing Egyptian Arabic speech and corresponding transcripts.

Training pipeline included:

  • Audio resampling to 16kHz
  • Text normalization
  • Filtering invalid samples
  • Whisper processor feature extraction
  • Fine-tuning using Hugging Face Transformers Trainer
  • GPU training with PyTorch

The training process aimed to improve transcription accuracy on dialectal Egyptian Arabic speech.


Intended Use

This model is intended for:

  • Egyptian Arabic speech transcription
  • Arabic speech recognition research
  • Voice AI applications
  • Speech dataset analysis
  • ASR benchmarking for Arabic dialects

Example use cases include:

  • Voice assistants
  • Transcription tools
  • Speech dataset labeling
  • Arabic voice interfaces

Limitations

  • The model is specialized for Egyptian Arabic and may perform worse on other Arabic dialects.
  • Performance may degrade on:
    • Noisy audio
    • Overlapping speech
    • Low-quality recordings
  • Like most ASR systems, rare words or names may be misrecognized.

Example Usage

import torch
import librosa
from transformers import WhisperProcessor, WhisperForConditionalGeneration

model_id = "itshamdi404/Egy_Arabic_whisper-small"

processor = WhisperProcessor.from_pretrained(model_id, language="ar", task="transcribe")
model = WhisperForConditionalGeneration.from_pretrained(model_id)

audio, sr = librosa.load("test.wav", sr=16000)

inputs = processor(
    audio,
    sampling_rate=16000,
    return_tensors="pt"
)

with torch.no_grad():
    predicted_ids = model.generate(inputs["input_features"])

transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print(transcription)

Model Evaluation

The model was evaluated using standard ASR metrics:

Metric Description
WER Word Error Rate
CER Character Error Rate

Evaluation was performed on unseen Egyptian Arabic speech samples.


Dataset

The model was fine-tuned using Egyptian Arabic speech data consisting of paired:

  • Audio recordings
  • Text transcripts

The dataset includes a variety of speakers and natural spoken language.


Future Improvements

Possible future improvements include:

  • Training on larger Egyptian Arabic datasets
  • Adding more speaker diversity
  • Improving robustness to noisy environments
  • Evaluating across multiple Arabic dialects

Author

Hamdi Mohamed — AI Engineer specializing in:

  • Large Language Models (LLMs)
  • Speech AI
  • Computer Vision

GitHub Hugging Face LinkedIn


Citation

If you use this model in your research or project, please cite:

@misc{hamdi2026whisper,
  author = {Hamdi Mohamed},
  title  = {Whisper Small Egyptian Arabic ASR Model},
  year   = {2026},
  url    = {https://huggingface.co/itshamdi404}
}
Downloads last month
202
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for itshamdi404/Egy_Arabic_whisper-small

Finetuned
(3438)
this model