YoussefAshmawy
/

Graduation_Project_Whisper_base

@@ -6,92 +6,103 @@ license: apache-2.0
 base_model: openai/whisper-base
 tags:
 - generated_from_trainer
 metrics:
 - wer
 model-index:
 - name: Whisper base AR - YA
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # Whisper base AR - YA
-This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the quran-ayat-speech-to-text dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.0023
-- Wer: 0.0405
-- Cer: 0.0195
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 16
-- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- lr_scheduler_warmup_steps: 500
-- num_epochs: 30
-- mixed_precision_training: Native AMP
-### Training results
-| Training Loss | Epoch | Step  | Validation Loss | Wer    | Cer    |
-|:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|
-| 0.0058        | 1.0   | 525   | 0.0025          | 0.0353 | 0.0177 |
-| 0.0018        | 2.0   | 1050  | 0.0031          | 0.0428 | 0.0197 |
-| 0.0017        | 3.0   | 1575  | 0.0040          | 0.0511 | 0.0246 |
-| 0.001         | 4.0   | 2100  | 0.0039          | 0.0469 | 0.0212 |
-| 0.0013        | 5.0   | 2625  | 0.0043          | 0.0505 | 0.0240 |
-| 0.0006        | 6.0   | 3150  | 0.0042          | 0.0478 | 0.0223 |
-| 0.0007        | 7.0   | 3675  | 0.0049          | 0.0534 | 0.0227 |
-| 0.0007        | 8.0   | 4200  | 0.0048          | 0.0552 | 0.0235 |
-| 0.0005        | 9.0   | 4725  | 0.0048          | 0.0501 | 0.0218 |
-| 0.0005        | 10.0  | 5250  | 0.0048          | 0.0513 | 0.0215 |
-| 0.0006        | 11.0  | 5775  | 0.0055          | 0.0528 | 0.0217 |
-| 0.0002        | 12.0  | 6300  | 0.0055          | 0.0542 | 0.0232 |
-| 0.0003        | 13.0  | 6825  | 0.0056          | 0.0530 | 0.0238 |
-| 0.0002        | 14.0  | 7350  | 0.0057          | 0.0498 | 0.0237 |
-| 0.0001        | 15.0  | 7875  | 0.0057          | 0.0446 | 0.0189 |
-| 0.0003        | 16.0  | 8400  | 0.0054          | 0.0567 | 0.0254 |
-| 0.0002        | 17.0  | 8925  | 0.0057          | 0.0540 | 0.0256 |
-| 0.0002        | 18.0  | 9450  | 0.0057          | 0.0530 | 0.0239 |
-| 0.0           | 19.0  | 9975  | 0.0056          | 0.0478 | 0.0228 |
-| 0.0           | 20.0  | 10500 | 0.0055          | 0.0473 | 0.0223 |
-| 0.0           | 21.0  | 11025 | 0.0056          | 0.0449 | 0.0202 |
-| 0.0           | 22.0  | 11550 | 0.0056          | 0.0461 | 0.0213 |
-| 0.0           | 23.0  | 12075 | 0.0057          | 0.0461 | 0.0213 |
-| 0.0           | 24.0  | 12600 | 0.0058          | 0.0465 | 0.0218 |
-| 0.0           | 25.0  | 13125 | 0.0058          | 0.0474 | 0.0224 |
-| 0.0           | 26.0  | 13650 | 0.0059          | 0.0465 | 0.0218 |
-| 0.0           | 27.0  | 14175 | 0.0059          | 0.0469 | 0.0219 |
-| 0.0           | 28.0  | 14700 | 0.0059          | 0.0461 | 0.0218 |
-| 0.0           | 29.0  | 15225 | 0.0054          | 0.0513 | 0.0229 |
-| 0.0           | 30.0  | 15750 | 0.0060          | 0.0463 | 0.0217 |
 ### Framework versions
-- Transformers 4.51.1
-- Pytorch 2.5.1+cu124
-- Datasets 2.20.0
-- Tokenizers 0.21.0

 base_model: openai/whisper-base
 tags:
 - generated_from_trainer
+- arabic
+- automatic-speech-recognition
+- quran
+- whisper
 metrics:
 - wer
+- cer
 model-index:
 - name: Whisper base AR - YA
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Automatic Speech Recognition
+    dataset:
+      name: Quran Ayat Speech-to-Text
+      type: audio
+    metrics:
+    - name: WER (Validation)
+      type: wer
+      value: 0.0405
+    - name: CER (Validation)
+      type: cer
+      value: 0.0195
+    - name: WER (Test)
+      type: wer
+      value: 0.082
+    - name: CER (Test)
+      type: cer
+      value: 0.0327
+pipeline_tag: automatic-speech-recognition
 ---
 # Whisper base AR - YA
+This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on an Arabic Quran recitation dataset focused on verse-level speech-to-text transcription. The goal was to create a lightweight ASR system that can accurately transcribe Quranic audio into Arabic text, optimized for clear, male recitation audio.
+It achieves the following results:
+- **Validation set:**
+  - **Loss**: 0.0023
+  - **WER (Word Error Rate)**: 4.05%
+  - **CER (Character Error Rate)**: 1.95%
+- **Test set:**
+  - **WER (Word Error Rate)**: 8.2%
+  - **CER (Character Error Rate)**: 3.27%
 ## Model description
+This model builds upon OpenAI's Whisper base architecture and is fine-tuned specifically for Modern Standard Arabic, with a focus on Quranic verses. Audio samples were cleaned, resampled to 16kHz, and aligned with text for training.
+The model is trained using CTC loss in a supervised setting, making it suitable for inference in streaming or batch-based ASR systems. Whisper’s multilingual capabilities were leveraged to build a domain-specific Arabic transcription model.
 ## Intended uses & limitations
+### Intended uses:
+- Speech recognition for Arabic Quran recitations
+- Educational tools or Quran learning applications
+- Mobile-friendly deployment of ASR for religious audio content
+- Fine-tuning or distillation for low-resource Arabic ASR projects
+### Limitations:
+- Optimized for clear, male Quran recitation—performance may degrade with female voices or conversational Arabic
+- Not designed for dialectal or informal speech
+- Background noise or overlapping speakers may reduce accuracy
 ## Training and evaluation data
+The dataset consists of verse-level Quran recitations in Arabic. The recordings were primarily from male speakers with clear tajweed (recitation rules), and aligned to their corresponding Arabic text.
+Audio files were resampled to 16kHz and normalized for Whisper compatibility.
+Evaluation was conducted on both a held-out validation set and a separate test set to assess generalization.
 ## Training procedure
 ### Training hyperparameters
+- `learning_rate`: 0.0001
+- `train_batch_size`: 8
+- `eval_batch_size`: 8
+- `gradient_accumulation_steps`: 2
+- `total_train_batch_size`: 16
+- `num_train_epochs`: 30
+- `seed`: 42
+- `lr_scheduler_type`: linear
+- `lr_scheduler_warmup_steps`: 500
+- `optimizer`: AdamW (betas=(0.9, 0.999), eps=1e-08)
+- `mixed_precision_training`: Native AMP
+Training was conducted using PyTorch with Hugging Face Trainer API. Metrics monitored include WER and CER.
+### Training results
+(Refer to the detailed epoch-wise table above.)
 ### Framework versions
+- Transformers: 4.51.1
+- PyTorch: 2.5.1+cu124
+- Datasets: 2.20.0
+- Tokenizers: 0.21.0