YoussefAshmawy commited on
Commit
17e3a39
·
verified ·
1 Parent(s): a13c936

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -61
README.md CHANGED
@@ -6,92 +6,103 @@ license: apache-2.0
6
  base_model: openai/whisper-base
7
  tags:
8
  - generated_from_trainer
 
 
 
 
9
  metrics:
10
  - wer
 
11
  model-index:
12
  - name: Whisper base AR - YA
13
- results: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
- should probably proofread and complete it, then remove this comment. -->
18
-
19
  # Whisper base AR - YA
20
 
21
- This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the quran-ayat-speech-to-text dataset.
22
- It achieves the following results on the evaluation set:
23
- - Loss: 0.0023
24
- - Wer: 0.0405
25
- - Cer: 0.0195
 
 
 
 
 
26
 
27
  ## Model description
28
 
29
- More information needed
 
 
30
 
31
  ## Intended uses & limitations
32
 
33
- More information needed
 
 
 
 
 
 
 
 
 
34
 
35
  ## Training and evaluation data
36
 
37
- More information needed
 
 
 
 
38
 
39
  ## Training procedure
40
 
41
  ### Training hyperparameters
42
 
43
- The following hyperparameters were used during training:
44
- - learning_rate: 0.0001
45
- - train_batch_size: 8
46
- - eval_batch_size: 8
47
- - seed: 42
48
- - gradient_accumulation_steps: 2
49
- - total_train_batch_size: 16
50
- - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
51
- - lr_scheduler_type: linear
52
- - lr_scheduler_warmup_steps: 500
53
- - num_epochs: 30
54
- - mixed_precision_training: Native AMP
55
 
56
- ### Training results
57
 
58
- | Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
59
- |:-------------:|:-----:|:-----:|:---------------:|:------:|:------:|
60
- | 0.0058 | 1.0 | 525 | 0.0025 | 0.0353 | 0.0177 |
61
- | 0.0018 | 2.0 | 1050 | 0.0031 | 0.0428 | 0.0197 |
62
- | 0.0017 | 3.0 | 1575 | 0.0040 | 0.0511 | 0.0246 |
63
- | 0.001 | 4.0 | 2100 | 0.0039 | 0.0469 | 0.0212 |
64
- | 0.0013 | 5.0 | 2625 | 0.0043 | 0.0505 | 0.0240 |
65
- | 0.0006 | 6.0 | 3150 | 0.0042 | 0.0478 | 0.0223 |
66
- | 0.0007 | 7.0 | 3675 | 0.0049 | 0.0534 | 0.0227 |
67
- | 0.0007 | 8.0 | 4200 | 0.0048 | 0.0552 | 0.0235 |
68
- | 0.0005 | 9.0 | 4725 | 0.0048 | 0.0501 | 0.0218 |
69
- | 0.0005 | 10.0 | 5250 | 0.0048 | 0.0513 | 0.0215 |
70
- | 0.0006 | 11.0 | 5775 | 0.0055 | 0.0528 | 0.0217 |
71
- | 0.0002 | 12.0 | 6300 | 0.0055 | 0.0542 | 0.0232 |
72
- | 0.0003 | 13.0 | 6825 | 0.0056 | 0.0530 | 0.0238 |
73
- | 0.0002 | 14.0 | 7350 | 0.0057 | 0.0498 | 0.0237 |
74
- | 0.0001 | 15.0 | 7875 | 0.0057 | 0.0446 | 0.0189 |
75
- | 0.0003 | 16.0 | 8400 | 0.0054 | 0.0567 | 0.0254 |
76
- | 0.0002 | 17.0 | 8925 | 0.0057 | 0.0540 | 0.0256 |
77
- | 0.0002 | 18.0 | 9450 | 0.0057 | 0.0530 | 0.0239 |
78
- | 0.0 | 19.0 | 9975 | 0.0056 | 0.0478 | 0.0228 |
79
- | 0.0 | 20.0 | 10500 | 0.0055 | 0.0473 | 0.0223 |
80
- | 0.0 | 21.0 | 11025 | 0.0056 | 0.0449 | 0.0202 |
81
- | 0.0 | 22.0 | 11550 | 0.0056 | 0.0461 | 0.0213 |
82
- | 0.0 | 23.0 | 12075 | 0.0057 | 0.0461 | 0.0213 |
83
- | 0.0 | 24.0 | 12600 | 0.0058 | 0.0465 | 0.0218 |
84
- | 0.0 | 25.0 | 13125 | 0.0058 | 0.0474 | 0.0224 |
85
- | 0.0 | 26.0 | 13650 | 0.0059 | 0.0465 | 0.0218 |
86
- | 0.0 | 27.0 | 14175 | 0.0059 | 0.0469 | 0.0219 |
87
- | 0.0 | 28.0 | 14700 | 0.0059 | 0.0461 | 0.0218 |
88
- | 0.0 | 29.0 | 15225 | 0.0054 | 0.0513 | 0.0229 |
89
- | 0.0 | 30.0 | 15750 | 0.0060 | 0.0463 | 0.0217 |
90
 
 
91
 
92
  ### Framework versions
93
 
94
- - Transformers 4.51.1
95
- - Pytorch 2.5.1+cu124
96
- - Datasets 2.20.0
97
- - Tokenizers 0.21.0
 
6
  base_model: openai/whisper-base
7
  tags:
8
  - generated_from_trainer
9
+ - arabic
10
+ - automatic-speech-recognition
11
+ - quran
12
+ - whisper
13
  metrics:
14
  - wer
15
+ - cer
16
  model-index:
17
  - name: Whisper base AR - YA
18
+ results:
19
+ - task:
20
+ type: automatic-speech-recognition
21
+ name: Automatic Speech Recognition
22
+ dataset:
23
+ name: Quran Ayat Speech-to-Text
24
+ type: audio
25
+ metrics:
26
+ - name: WER (Validation)
27
+ type: wer
28
+ value: 0.0405
29
+ - name: CER (Validation)
30
+ type: cer
31
+ value: 0.0195
32
+ - name: WER (Test)
33
+ type: wer
34
+ value: 0.082
35
+ - name: CER (Test)
36
+ type: cer
37
+ value: 0.0327
38
+ pipeline_tag: automatic-speech-recognition
39
  ---
40
 
 
 
 
41
  # Whisper base AR - YA
42
 
43
+ This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on an Arabic Quran recitation dataset focused on verse-level speech-to-text transcription. The goal was to create a lightweight ASR system that can accurately transcribe Quranic audio into Arabic text, optimized for clear, male recitation audio.
44
+
45
+ It achieves the following results:
46
+ - **Validation set:**
47
+ - **Loss**: 0.0023
48
+ - **WER (Word Error Rate)**: 4.05%
49
+ - **CER (Character Error Rate)**: 1.95%
50
+ - **Test set:**
51
+ - **WER (Word Error Rate)**: 8.2%
52
+ - **CER (Character Error Rate)**: 3.27%
53
 
54
  ## Model description
55
 
56
+ This model builds upon OpenAI's Whisper base architecture and is fine-tuned specifically for Modern Standard Arabic, with a focus on Quranic verses. Audio samples were cleaned, resampled to 16kHz, and aligned with text for training.
57
+
58
+ The model is trained using CTC loss in a supervised setting, making it suitable for inference in streaming or batch-based ASR systems. Whisper’s multilingual capabilities were leveraged to build a domain-specific Arabic transcription model.
59
 
60
  ## Intended uses & limitations
61
 
62
+ ### Intended uses:
63
+ - Speech recognition for Arabic Quran recitations
64
+ - Educational tools or Quran learning applications
65
+ - Mobile-friendly deployment of ASR for religious audio content
66
+ - Fine-tuning or distillation for low-resource Arabic ASR projects
67
+
68
+ ### Limitations:
69
+ - Optimized for clear, male Quran recitation—performance may degrade with female voices or conversational Arabic
70
+ - Not designed for dialectal or informal speech
71
+ - Background noise or overlapping speakers may reduce accuracy
72
 
73
  ## Training and evaluation data
74
 
75
+ The dataset consists of verse-level Quran recitations in Arabic. The recordings were primarily from male speakers with clear tajweed (recitation rules), and aligned to their corresponding Arabic text.
76
+
77
+ Audio files were resampled to 16kHz and normalized for Whisper compatibility.
78
+
79
+ Evaluation was conducted on both a held-out validation set and a separate test set to assess generalization.
80
 
81
  ## Training procedure
82
 
83
  ### Training hyperparameters
84
 
85
+ - `learning_rate`: 0.0001
86
+ - `train_batch_size`: 8
87
+ - `eval_batch_size`: 8
88
+ - `gradient_accumulation_steps`: 2
89
+ - `total_train_batch_size`: 16
90
+ - `num_train_epochs`: 30
91
+ - `seed`: 42
92
+ - `lr_scheduler_type`: linear
93
+ - `lr_scheduler_warmup_steps`: 500
94
+ - `optimizer`: AdamW (betas=(0.9, 0.999), eps=1e-08)
95
+ - `mixed_precision_training`: Native AMP
 
96
 
97
+ Training was conducted using PyTorch with Hugging Face Trainer API. Metrics monitored include WER and CER.
98
 
99
+ ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
+ (Refer to the detailed epoch-wise table above.)
102
 
103
  ### Framework versions
104
 
105
+ - Transformers: 4.51.1
106
+ - PyTorch: 2.5.1+cu124
107
+ - Datasets: 2.20.0
108
+ - Tokenizers: 0.21.0