nielsr HF Staff commited on
Commit
5d9a339
·
verified ·
1 Parent(s): 97d4b38

Improve model card: Add pipeline tag, language, project/code links, description, and usage

Browse files

This PR significantly enhances the model card for `recitation-segmenter-v2` by:

* Adding the `pipeline_tag: automatic-speech-recognition` and `language: ar` metadata for better discoverability and context on the Hub.
* Including links to the associated paper ([Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning](https://huggingface.co/papers/2509.00094)), the GitHub repository (https://github.com/obadx/recitations-segmenter), and the project page (https://obadx.github.io/prepare-quran-dataset/).
* Replacing generic placeholders with a detailed model description, intended uses and limitations, and training data information, extracted from the paper abstract and GitHub README.
* Adding relevant `tags` such as `arabic`, `quran`, and `speech-segmentation`.
* Adding a `transformers`-based Python code snippet for easy inference, as provided in the original GitHub repository.
* Including a BibTeX citation for proper academic attribution.

These improvements make the model card more informative, discoverable, and user-friendly.

Files changed (1) hide show
  1. README.md +155 -11
README.md CHANGED
@@ -1,25 +1,31 @@
1
  ---
 
2
  library_name: transformers
3
  license: mit
4
- base_model: facebook/w2v-bert-2.0
5
- tags:
6
- - generated_from_trainer
7
  metrics:
8
  - accuracy
9
  - f1
10
  - precision
11
  - recall
 
 
 
 
 
12
  model-index:
13
  - name: recitation-segmenter-v2
14
  results: []
 
 
15
  ---
16
 
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
 
19
 
20
- # recitation-segmenter-v2
 
21
 
22
- This model is a fine-tuned version of [facebook/w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) on an unknown dataset.
23
  It achieves the following results on the evaluation set:
24
  - Accuracy: 0.9958
25
  - F1: 0.9964
@@ -29,18 +35,145 @@ It achieves the following results on the evaluation set:
29
 
30
  ## Model description
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
33
 
34
  ## Intended uses & limitations
35
 
36
- More information needed
 
 
 
 
 
 
 
37
 
38
  ## Training and evaluation data
39
 
40
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Training procedure
43
 
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
@@ -61,10 +194,21 @@ The following hyperparameters were used during training:
61
  | 0.0234 | 0.5014 | 550 | 0.9953 | 0.9959 | 0.0185 | 0.9940 | 0.9977 |
62
  | 0.0186 | 0.7521 | 825 | 0.9958 | 0.9964 | 0.0132 | 0.9976 | 0.9951 |
63
 
64
-
65
  ### Framework versions
66
 
67
  - Transformers 4.51.3
68
  - Pytorch 2.2.1+cu121
69
  - Datasets 3.5.0
70
  - Tokenizers 0.21.1
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: facebook/w2v-bert-2.0
3
  library_name: transformers
4
  license: mit
 
 
 
5
  metrics:
6
  - accuracy
7
  - f1
8
  - precision
9
  - recall
10
+ tags:
11
+ - generated_from_trainer
12
+ - arabic
13
+ - quran
14
+ - speech-segmentation
15
  model-index:
16
  - name: recitation-segmenter-v2
17
  results: []
18
+ pipeline_tag: automatic-speech-recognition
19
+ language: ar
20
  ---
21
 
22
+ # recitation-segmenter-v2: Quranic Recitation Segmenter
23
+
24
+ This model is a fine-tuned version of [facebook/w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) for segmenting Holy Quran recitations based on pause points (waqf). It was presented in the paper [Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning](https://huggingface.co/papers/2509.00094).
25
 
26
+ Project Page: https://obadx.github.io/prepare-quran-dataset/
27
+ GitHub Repository: https://github.com/obadx/recitations-segmenter
28
 
 
29
  It achieves the following results on the evaluation set:
30
  - Accuracy: 0.9958
31
  - F1: 0.9964
 
35
 
36
  ## Model description
37
 
38
+ The `recitation-segmenter-v2` model is an enhanced AI model capable of segmenting Holy Quran recitations based on pause points (`waqf`) with high accuracy. It is built upon a fine-tuned [Wav2Vec2Bert](https://huggingface.co/docs/transformers/model_doc/wav2vec2-bert) model, performing Sequence Frame Level Classification with a 20-millisecond resolution. This model and its accompanying Python library are designed for high-performance processing of any number and length of Quranic recitations, from a few seconds to several hours, without performance degradation.
39
+
40
+ Key Features:
41
+ * Segments Quranic recitations according to `waqf` (pause rules).
42
+ * Specifically trained for Quranic recitations.
43
+ * High accuracy, up to 20 milliseconds precision.
44
+ * Requires only ~3 GB of GPU memory.
45
+ * Capable of processing recitations of any duration without performance loss.
46
+
47
+ The model is part of a larger effort described in the associated paper, aiming to bridge gaps in assessing spoken language for the Holy Quran. This includes an automated pipeline to produce high-quality Quranic datasets and a novel ASR-based approach for pronunciation error detection using a custom Quran Phonetic Script (QPS).
48
 
49
  ## Intended uses & limitations
50
 
51
+ This model is primarily intended for:
52
+ * Automatic segmentation of Holy Quran recitations for educational purposes or content analysis.
53
+ * Building high-quality Quranic audio databases.
54
+ * As a foundational component for larger systems focused on pronunciation error detection and correction for Quran learners.
55
+
56
+ **Limitations**:
57
+ * The segmenter currently considers `sakt` (a very short pause without breath) as a full `waqf` (stop), which might be a nuance for advanced Tajweed analysis.
58
+ * The model is specifically trained and optimized for Quranic recitations and might not generalize well to other forms of spoken Arabic.
59
 
60
  ## Training and evaluation data
61
 
62
+ The model was fine-tuned on a meticulously collected dataset of Quranic recitations. The data collection process, described in the associated paper, involved a 98% automated pipeline including collection from expert reciters, segmentation at pause points (`waqf`) using a fine-tuned `wav2vec2-BERT` model, transcription of segments, and transcript verification via a novel Tasmeea algorithm. The dataset comprises over 850 hours of audio (~300K annotated utterances).
63
+
64
+ The data preparation involved:
65
+ 1. Downloading Quranic recitations and converting them to Hugging Face Audio Dataset format at 16000 Hz sample rate.
66
+ 2. Pre-segmenting verses based on pauses using `sliero-vad-v4` from [everyayah.com](https://everyayah.com).
67
+ 3. Applying post-processing (e.g., `min_silence_duration_ms`, `min_speech_duration_ms`, `pad_duration_ms`) to refine segments and manual verification for high-quality divisions.
68
+ 4. Applying data augmentation techniques, including time stretching (speeding up/slowing down 40% of recitations) and various audio effects (Aliasing, AddGaussianNoise, BandPassFilter, PitchShift, RoomSimulator, etc.) using the `audiomentations` library.
69
+ 5. Normalizing audio segments to 16000 Hz and chunking them, with a maximum length of 20 seconds, using a sliding window approach for longer segments.
70
+
71
+ The training dataset and its augmented version are available on Hugging Face:
72
+ * [Training Data](https://huggingface.co/datasets/obadx/recitation-segmentation)
73
+ * [Augmented Training Data](https://huggingface.co/datasets/obadx/recitation-segmentation-augmented)
74
+
75
+ ## Usage
76
+
77
+ You can use this model with its accompanying Python library, `recitations-segmenter`, which integrates with Hugging Face `transformers`.
78
+
79
+ First, ensure `ffmpeg` and `libsoundfile` are installed system-wide.
80
+
81
+ ### Requirements
82
+
83
+ Install `ffmpeg` and `libsoundfile` system-wide.
84
+
85
+ #### Linux
86
+
87
+ ```bash
88
+ sudo apt-get update
89
+ sudo apt-get install -y ffmpeg libsndfile1 portaudio19-dev
90
+ ```
91
+
92
+ #### Windows & Mac
93
+
94
+ You can create an `anaconda` environment and then install these libraries:
95
+
96
+ ```bash
97
+ conda create -n segment python=3.12
98
+ conda activate segment
99
+ conda install -c conda-forge ffmpeg libsndfile
100
+ ```
101
+
102
+ ### Via pip
103
+
104
+ ```bash
105
+ pip install recitations-segmenter
106
+ ```
107
+
108
+ ### Sample usage (Python API)
109
+
110
+ Here's a complete example for using the library in Python. A Google Colab example is also available: [Open in Colab](https://colab.research.google.com/drive/1-RuRQOj4l2MA_SG2p4m-afR7MAsT5I22?usp=sharing)
111
+
112
+ ```python
113
+ from pathlib import Path
114
+
115
+ from recitations_segmenter import segment_recitations, read_audio, clean_speech_intervals
116
+ from transformers import AutoFeatureExtractor, AutoModelForAudioFrameClassification
117
+ import torch
118
+
119
+ if __name__ == '__main__':
120
+ device = torch.device('cuda')
121
+ dtype = torch.bfloat16
122
+
123
+ processor = AutoFeatureExtractor.from_pretrained(
124
+ "obadx/recitation-segmenter-v2")
125
+ model = AutoModelForAudioFrameClassification.from_pretrained(
126
+ "obadx/recitation-segmenter-v2",
127
+ )
128
+
129
+ model.to(device, dtype=dtype)
130
+
131
+ # Change this to the file pathes of Holy Quran recitations
132
+ # File pathes with the Holy Quran Recitations
133
+ file_pathes = [
134
+ './assets/dussary_002282.mp3',
135
+ './assets/hussary_053001.mp3',
136
+ ]
137
+ waves = [read_audio(p) for p in file_pathes]
138
+
139
+ # Extracting speech inervals in samples according to 16000 Sample rate
140
+ sampled_outputs = segment_recitations(
141
+ waves,
142
+ model,
143
+ processor,
144
+ device=device,
145
+ dtype=dtype,
146
+ batch_size=8,
147
+ )
148
+
149
+ for out, path in zip(sampled_outputs, file_pathes):
150
+ # Clean The speech intervals by:
151
+ # * merging small silence durations
152
+ # * remove small speech durations
153
+ # * add padding to each speech duration
154
+ # Raises:
155
+ # * NoSpeechIntervals: if the wav is complete silence
156
+ # * TooHighMinSpeechDruation: if `min_speech_duration` is too high which
157
+ # resuls for deleting all speech intervals
158
+ clean_out = clean_speech_intervals(
159
+ out.speech_intervals,
160
+ out.is_complete,
161
+ min_silence_duration_ms=30,
162
+ min_speech_duration_ms=30,
163
+ pad_duration_ms=30,
164
+ return_seconds=True,
165
+ )
166
+
167
+ print(f'Speech Intervals of: {Path(path).name}: ')
168
+ print(clean_out.clean_speech_intervals)
169
+ print(f'Is Recitation Complete: {clean_out.is_complete}')
170
+ print('-' * 40)
171
+ ```
172
 
173
  ## Training procedure
174
 
175
+ The model was trained on `Wav2Vec2BertForAudioFrameClassification` using the `transformers` library. More detailed motivations, methodology, and setup can be found in the GitHub repository's "تفاصيل التدريب" section.
176
+
177
  ### Training hyperparameters
178
 
179
  The following hyperparameters were used during training:
 
194
  | 0.0234 | 0.5014 | 550 | 0.9953 | 0.9959 | 0.0185 | 0.9940 | 0.9977 |
195
  | 0.0186 | 0.7521 | 825 | 0.9958 | 0.9964 | 0.0132 | 0.9976 | 0.9951 |
196
 
 
197
  ### Framework versions
198
 
199
  - Transformers 4.51.3
200
  - Pytorch 2.2.1+cu121
201
  - Datasets 3.5.0
202
  - Tokenizers 0.21.1
203
+
204
+ ## Citation
205
+ If you find our work helpful or inspiring, please feel free to cite it.
206
+
207
+ ```bibtex
208
+ @article{ibrahim2025automatic,
209
+ title={Automatic Pronunciation Error Detection and Correction of the Holy Quran's Learners Using Deep Learning},
210
+ author={Ibrahim, Obad and El-Sayed, Tamer and El-Din, Sherif Amin},
211
+ journal={arXiv preprint arXiv:2509.00094},
212
+ year={2025}
213
+ }
214
+ ```