Automatic Speech Recognition
Transformers
Safetensors
Khmer
English
troryongasr
custom_code
Kimang18 commited on
Commit
7fc219a
·
verified ·
1 Parent(s): 5bbb385

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -2
README.md CHANGED
@@ -89,10 +89,13 @@ The evaluation assesses two capabilities — language detection and transcriptio
89
 
90
  <!-- This should link to a Dataset Card if possible. -->
91
 
 
 
92
  | Dataset | Language | Testing examples | Description |
93
  | --------- | ---------- | ------------- | - |
94
  | **google/fleurs** | Khmer | 765 | Multi-lingual dataset with Khmer language samples |
95
  | **librispeech.clean** | English | 2620 | Clean speech dataset for English transcription |
 
96
 
97
  **Note:** All evaluation results below are from the **test split** of each dataset. For `google/fleurs`, audios longer than `30 seconds` are excluded from the evaluation.
98
 
@@ -104,22 +107,26 @@ The evaluation assesses two capabilities — language detection and transcriptio
104
 
105
  **Task:** Given audio input, detect the language.
106
 
 
107
  | Metric | Description |
108
  |--------|-------------|
109
  | **Precision** | Proportion of predicted languages that are correct |
110
  | **Recall** | Proportion of actual language samples correctly identified |
111
  | **Accuracy** | Proportion of total predictions that are correct |
112
  | **F1-score** | Harmonic mean of precision and recall |
 
113
 
114
  ##### Transcription
115
 
116
  **Task:** Convert audio to text (transcription).
117
 
 
118
  | Metric | Description |
119
  |--------|-------------|
120
  | **Token Error Rate** | Proportion of incorrectly transcribed tokens |
121
  | **Character Error Rate (CER)** | Proportion of characters that are incorrect |
122
  | **Word Error Rate (WER)** | Proportion of words that are incorrect |
 
123
 
124
  **Note on Token Error Rate:** Token Error Rate measures model's capability in predicting the next token given the audio input and the current sequence of tokens. This metric is weaker than Word Error Rate (WER) and Character Error Rate (CER) because it doesn't account for insertions, deletions, and substitutions as comprehensively. Token Error Rate is used here because Khmer text lacks word boundaries, making WER and CER calculations challenging without additional preprocessing.
125
 
@@ -130,10 +137,12 @@ The evaluation assesses two capabilities — language detection and transcriptio
130
 
131
  #### Language Detection Results
132
 
 
133
  | Dataset | Precision | Recall | Accuracy | F1-score |
134
  |---------|-----------|--------|----------|----------|
135
  | google/fleurs (Khmer) | 100% | 100% | 100% | 100% |
136
  | librispeech.clean (English) | 100% | 100% | 100% | 100% |
 
137
 
138
  **Key Finding:** Both model sizes achieved perfect language detection performance on both datasets, indicating excellent binary classification capability for distinguishing between Khmer and English audio.
139
 
@@ -142,11 +151,13 @@ The evaluation assesses two capabilities — language detection and transcriptio
142
 
143
  #### Transcription Results
144
 
 
145
  | Metric | Combined (Khmer + English) | Khmer | English |
146
  |--------|---------------------------|-------|---------|
147
  | Token Error Rate | 29% | 56% | 19% |
148
  | Character Error Rate (CER) | 32.89% | 60.71% | 20.98% |
149
  | Word Error Rate (WER) | 46.53% | 86.16% | 31.13% |
 
150
 
151
  **Key Observations:**
152
  - The model shows strong performance on English (19% token error rate, 20.98% CER, 31.13% WER)
@@ -242,12 +253,14 @@ For transcription task, the model was trained on around 140 hours of Khmer audio
242
  Khmer datasets include [`DDD-Cambodia/khm-asr-cultural`](https://huggingface.co/datasets/DDD-Cambodia/khm-asr-cultural) (134.6 hours), [`openslr/openslr`](https://huggingface.co/datasets/Kimang18/openslr-SLR42/blob/main/README.md), and [`google/fleurs`](https://huggingface.co/datasets/Kimang18/google-fleurs-km-kh).
243
  Split `clean.100` of [`openslr/librispeech_asr`](https://huggingface.co/datasets/openslr/librispeech_asr) was used as English dataset.
244
 
 
245
  | Dataset | Language | Training examples | Validation examples | Description |
246
  | --------- | ---------- | ----------------- | ------------------- |- |
247
  | **openslr/openslr** | Khmer | 2906 | 0 | Multi-speaker TTS data for Khmer language (split `SLR42`) |
248
  | **google/fleurs** | Khmer | 1675 | 324 | TTS data for Khmer language (split `km_kh`) |
249
  | **DDD-Cambodia/khm-asr-cultural** | Khmer | 56716 | 0 | Khmer ASR Cultural Dataset (split `train`) |
250
  | **librispeech.clean** | English | 28539 | 2703 | Clean speech dataset for English transcription |
 
251
 
252
  #### Translation Task
253
 
@@ -291,10 +304,10 @@ The training took around 10 hours.
291
  [More Information Needed]
292
 
293
 
294
- ## Model Card Authors
295
 
 
296
  Name: KHUN Kimang (Ph.D.)
297
- Email: kimang.khun@polytechnique.org
298
 
299
  ## Model Card Contact
300
 
 
89
 
90
  <!-- This should link to a Dataset Card if possible. -->
91
 
92
+ <div align="center">
93
+
94
  | Dataset | Language | Testing examples | Description |
95
  | --------- | ---------- | ------------- | - |
96
  | **google/fleurs** | Khmer | 765 | Multi-lingual dataset with Khmer language samples |
97
  | **librispeech.clean** | English | 2620 | Clean speech dataset for English transcription |
98
+ </div>
99
 
100
  **Note:** All evaluation results below are from the **test split** of each dataset. For `google/fleurs`, audios longer than `30 seconds` are excluded from the evaluation.
101
 
 
107
 
108
  **Task:** Given audio input, detect the language.
109
 
110
+ <div align="center">
111
  | Metric | Description |
112
  |--------|-------------|
113
  | **Precision** | Proportion of predicted languages that are correct |
114
  | **Recall** | Proportion of actual language samples correctly identified |
115
  | **Accuracy** | Proportion of total predictions that are correct |
116
  | **F1-score** | Harmonic mean of precision and recall |
117
+ </div>
118
 
119
  ##### Transcription
120
 
121
  **Task:** Convert audio to text (transcription).
122
 
123
+ <div align="center">
124
  | Metric | Description |
125
  |--------|-------------|
126
  | **Token Error Rate** | Proportion of incorrectly transcribed tokens |
127
  | **Character Error Rate (CER)** | Proportion of characters that are incorrect |
128
  | **Word Error Rate (WER)** | Proportion of words that are incorrect |
129
+ </div>
130
 
131
  **Note on Token Error Rate:** Token Error Rate measures model's capability in predicting the next token given the audio input and the current sequence of tokens. This metric is weaker than Word Error Rate (WER) and Character Error Rate (CER) because it doesn't account for insertions, deletions, and substitutions as comprehensively. Token Error Rate is used here because Khmer text lacks word boundaries, making WER and CER calculations challenging without additional preprocessing.
132
 
 
137
 
138
  #### Language Detection Results
139
 
140
+ <div align="center">
141
  | Dataset | Precision | Recall | Accuracy | F1-score |
142
  |---------|-----------|--------|----------|----------|
143
  | google/fleurs (Khmer) | 100% | 100% | 100% | 100% |
144
  | librispeech.clean (English) | 100% | 100% | 100% | 100% |
145
+ </div>
146
 
147
  **Key Finding:** Both model sizes achieved perfect language detection performance on both datasets, indicating excellent binary classification capability for distinguishing between Khmer and English audio.
148
 
 
151
 
152
  #### Transcription Results
153
 
154
+ <div align="center">
155
  | Metric | Combined (Khmer + English) | Khmer | English |
156
  |--------|---------------------------|-------|---------|
157
  | Token Error Rate | 29% | 56% | 19% |
158
  | Character Error Rate (CER) | 32.89% | 60.71% | 20.98% |
159
  | Word Error Rate (WER) | 46.53% | 86.16% | 31.13% |
160
+ </div>
161
 
162
  **Key Observations:**
163
  - The model shows strong performance on English (19% token error rate, 20.98% CER, 31.13% WER)
 
253
  Khmer datasets include [`DDD-Cambodia/khm-asr-cultural`](https://huggingface.co/datasets/DDD-Cambodia/khm-asr-cultural) (134.6 hours), [`openslr/openslr`](https://huggingface.co/datasets/Kimang18/openslr-SLR42/blob/main/README.md), and [`google/fleurs`](https://huggingface.co/datasets/Kimang18/google-fleurs-km-kh).
254
  Split `clean.100` of [`openslr/librispeech_asr`](https://huggingface.co/datasets/openslr/librispeech_asr) was used as English dataset.
255
 
256
+ <div align="center">
257
  | Dataset | Language | Training examples | Validation examples | Description |
258
  | --------- | ---------- | ----------------- | ------------------- |- |
259
  | **openslr/openslr** | Khmer | 2906 | 0 | Multi-speaker TTS data for Khmer language (split `SLR42`) |
260
  | **google/fleurs** | Khmer | 1675 | 324 | TTS data for Khmer language (split `km_kh`) |
261
  | **DDD-Cambodia/khm-asr-cultural** | Khmer | 56716 | 0 | Khmer ASR Cultural Dataset (split `train`) |
262
  | **librispeech.clean** | English | 28539 | 2703 | Clean speech dataset for English transcription |
263
+ </div>
264
 
265
  #### Translation Task
266
 
 
304
  [More Information Needed]
305
 
306
 
307
+ ## Model Card Author
308
 
309
+ ឈ្មោះ: បណ្ឌិត ឃុន គីមអាង
310
  Name: KHUN Kimang (Ph.D.)
 
311
 
312
  ## Model Card Contact
313