| | --- |
| | language: ar |
| | license: apache-2.0 |
| | tags: |
| | - whisper |
| | - automatic-speech-recognition |
| | - asr |
| | - audio |
| | - arabic |
| | - egyptian-arabic |
| | datasets: |
| | - MAdel121/arabic-egy-cleaned |
| | metrics: |
| | - wer |
| | - cer |
| | base_model: openai/whisper-medium |
| | pipeline_tag: automatic-speech-recognition |
| | library_name: transformers |
| | model-index: |
| | - name: whisper-medium-egy |
| | results: |
| | - task: |
| | type: automatic-speech-recognition |
| | name: Speech Recognition |
| | dataset: |
| | name: MAdel121/arabic-egy-cleaned (validation split) |
| | type: MAdel121/arabic-egy-cleaned |
| | config: ar |
| | split: validation |
| | metrics: |
| | - name: WER |
| | type: wer |
| | value: 18.029990439289488 |
| | - name: CER |
| | type: cer |
| | value: 13.375029793807732 |
| | --- |
| | |
| | # Whisper Medium Egyptian Arabic (whisper-medium-egy) |
| |
|
| | This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on a custom dataset of 72 hours of Egyptian Arabic speech. It's designed for Automatic Speech Recognition (ASR) for the Egyptian Arabic dialect. |
| |
|
| | ## Model Description |
| |
|
| | * **Base Model:** `openai/whisper-medium` |
| | * **Language:** Arabic (ar), specifically focused on Egyptian dialect (arz) |
| | * **Fine-tuning Dataset:** `MAdel121/arabic-egy-cleaned` (approx. 72 hours) |
| | * **Total Training Steps:** 7299 |
| | * **Epochs:** 10 |
| |
|
| | ## Intended Uses & Limitations |
| |
|
| | This model is intended for transcribing speech in Egyptian Arabic. |
| |
|
| | **Intended Use:** |
| | * Automatic transcription of audio recordings and live speech in Egyptian Arabic. |
| | * Assisting with content creation, subtitling, and voice-controlled applications for Egyptian Arabic speakers. |
| |
|
| | **Limitations:** |
| | * Performance may degrade in highly noisy environments or with very strong, non-Egyptian accents. |
| | * The model was fine-tuned on a specific dataset; its performance on significantly different domains or audio characteristics might vary. |
| | * The training data primarily consists of [describe your dataset sources/domains if possible, e.g., "YouTube videos", "audiobooks", "scripted conversations"]. Performance might be better on similar types of audio. |
| |
|
| | ## How to Use |
| |
|
| | You can use this model with the `transformers` library and the `pipeline` interface for ease of use. |
| |
|
| | ```python |
| | from transformers import pipeline |
| | import torch |
| | |
| | device = "cuda:0" if torch.cuda.is_available() else "cpu" |
| | |
| | pipe = pipeline( |
| | "automatic-speech-recognition", |
| | model="YOUR_HF_USERNAME/whisper-medium-egy", # Replace YOUR_HF_USERNAME with your Hugging Face username |
| | device=device |
| | ) |
| | |
| | # Example with a local audio file |
| | # audio_file = "path/to/your/egyptian_arabic_audio.wav" |
| | # transcription = pipe(audio_file, generate_kwargs={"language": "arabic"})["text"] |
| | # print(transcription) |
| | |
| | # Example with a Hugging Face dataset audio sample |
| | # from datasets import load_dataset |
| | # ds = load_dataset("MAdel121/arabic-egy-cleaned", "ar", split="validation") # Or your test split |
| | # sample = ds[0]["audio"] # Make sure your dataset has an "audio" column |
| | # result = pipe(sample.copy(), generate_kwargs={"language": "arabic"}) |
| | # print(result["text"]) |
| | ``` |
| | Make sure to replace `"YOUR_HF_USERNAME/whisper-medium-egy"` with the actual model ID after uploading. The `generate_kwargs={"language": "arabic"}` is important for Whisper models to ensure correct tokenization and transcription for the target language. |
| |
|
| | ## Training Data |
| |
|
| | The model was fine-tuned on the `MAdel121/arabic-egy-cleaned` dataset available on the Hugging Face Hub. This dataset contains approximately 72 hours of Egyptian Arabic audio paired with transcripts. |
| |
|
| | ## Training Procedure |
| |
|
| | The model was trained using the `transformers` library. The fine-tuning process involved the following key hyperparameters: |
| |
|
| | * **Base Model:** `openai/whisper-medium` |
| | * **Optimizer:** AdamW |
| | * **Learning Rate:** 1e-5 (0.00001) |
| | * **Warmup Steps:** 1000 |
| | * **Weight Decay:** 0.05 |
| | * **Gradient Accumulation Factor:** 2 |
| | * **Batch Size (loader_batch_size):** 8 (effective batch size would be 8 * 2 = 16) |
| | * **Number of Epochs:** 10 |
| | * **Max Grad Norm:** 5 |
| | * **Augmentations Used:** |
| | * `use_drop_freq`: true |
| | * `use_drop_chunk`: true |
| | * `use_drop_bit_resolution`: true |
| | * Other augmentations like `use_add_noise`, `use_speed_perturb`, `use_pitch_shift`, `use_add_reverb`, `use_codec_augment`, `use_gain` were set to `false` |
| | * **Task:** transcribe |
| | * **Language:** ar |
| | * **Seed:** 1986 |
| |
|
| | Training was done on 1x A100 (80GB) on Modal Labs |
| |
|
| | The training was managed and tracked using Weights & Biases under the project `whisper-medium-egyptian-arabic` with resume ID `r3sz4v27`. |
| |
|
| | ## Training Code |
| |
|
| | Can be found on [Github here](https://github.com/moadel321/Fine-tuning-whisper-on-Modal-Labs-with-speech-brain-augmentations-/blob/c85312785faa2b927cbc217fe43acb8ed660d2ee/train_whisper_modal.py) |
| |
|
| | ## Weights & Biases |
| |
|
| | Run can be found here : https://wandb.ai/m-adelomar1/whisper-medium-egyptian-arabic/ |
| |
|
| | ## Evaluation Results |
| |
|
| | The model was evaluated on the `validation` split of the `MAdel121/arabic-egy-cleaned` dataset. |
| |
|
| | * **Word Error Rate (WER):** 18.03% |
| | * **Character Error Rate (CER):** 13.38% |
| |
|
| | These metrics indicate the performance of the model on the validation set. Lower values are better. |
| |
|
| | ### BibTeX Citation |
| |
|
| | ```bibtex |
| | @misc{madel_2025_whisper_medium_egy, |
| | author = Madel |
| | title = {Whisper Medium Fine-tuned for Egyptian Arabic}, |
| | year = {2025}, |
| | publisher = {Hugging Face}, |
| | journal = {Hugging Face Hub}, |
| | howpublished = {\\url{https://huggingface.co/MAdel121/whisper-medium-egy}} // Replace with actual URL |
| | } |
| | ``` |