Instructions to use KrorngAI/TrorYongASR-tiny with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use KrorngAI/TrorYongASR-tiny with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="KrorngAI/TrorYongASR-tiny", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("KrorngAI/TrorYongASR-tiny", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -89,10 +89,13 @@ The evaluation assesses two capabilities — language detection and transcriptio
|
|
| 89 |
|
| 90 |
<!-- This should link to a Dataset Card if possible. -->
|
| 91 |
|
|
|
|
|
|
|
| 92 |
| Dataset | Language | Testing examples | Description |
|
| 93 |
| --------- | ---------- | ------------- | - |
|
| 94 |
| **google/fleurs** | Khmer | 765 | Multi-lingual dataset with Khmer language samples |
|
| 95 |
| **librispeech.clean** | English | 2620 | Clean speech dataset for English transcription |
|
|
|
|
| 96 |
|
| 97 |
**Note:** All evaluation results below are from the **test split** of each dataset. For `google/fleurs`, audios longer than `30 seconds` are excluded from the evaluation.
|
| 98 |
|
|
@@ -104,22 +107,26 @@ The evaluation assesses two capabilities — language detection and transcriptio
|
|
| 104 |
|
| 105 |
**Task:** Given audio input, detect the language.
|
| 106 |
|
|
|
|
| 107 |
| Metric | Description |
|
| 108 |
|--------|-------------|
|
| 109 |
| **Precision** | Proportion of predicted languages that are correct |
|
| 110 |
| **Recall** | Proportion of actual language samples correctly identified |
|
| 111 |
| **Accuracy** | Proportion of total predictions that are correct |
|
| 112 |
| **F1-score** | Harmonic mean of precision and recall |
|
|
|
|
| 113 |
|
| 114 |
##### Transcription
|
| 115 |
|
| 116 |
**Task:** Convert audio to text (transcription).
|
| 117 |
|
|
|
|
| 118 |
| Metric | Description |
|
| 119 |
|--------|-------------|
|
| 120 |
| **Token Error Rate** | Proportion of incorrectly transcribed tokens |
|
| 121 |
| **Character Error Rate (CER)** | Proportion of characters that are incorrect |
|
| 122 |
| **Word Error Rate (WER)** | Proportion of words that are incorrect |
|
|
|
|
| 123 |
|
| 124 |
**Note on Token Error Rate:** Token Error Rate measures model's capability in predicting the next token given the audio input and the current sequence of tokens. This metric is weaker than Word Error Rate (WER) and Character Error Rate (CER) because it doesn't account for insertions, deletions, and substitutions as comprehensively. Token Error Rate is used here because Khmer text lacks word boundaries, making WER and CER calculations challenging without additional preprocessing.
|
| 125 |
|
|
@@ -130,10 +137,12 @@ The evaluation assesses two capabilities — language detection and transcriptio
|
|
| 130 |
|
| 131 |
#### Language Detection Results
|
| 132 |
|
|
|
|
| 133 |
| Dataset | Precision | Recall | Accuracy | F1-score |
|
| 134 |
|---------|-----------|--------|----------|----------|
|
| 135 |
| google/fleurs (Khmer) | 100% | 100% | 100% | 100% |
|
| 136 |
| librispeech.clean (English) | 100% | 100% | 100% | 100% |
|
|
|
|
| 137 |
|
| 138 |
**Key Finding:** Both model sizes achieved perfect language detection performance on both datasets, indicating excellent binary classification capability for distinguishing between Khmer and English audio.
|
| 139 |
|
|
@@ -142,11 +151,13 @@ The evaluation assesses two capabilities — language detection and transcriptio
|
|
| 142 |
|
| 143 |
#### Transcription Results
|
| 144 |
|
|
|
|
| 145 |
| Metric | Combined (Khmer + English) | Khmer | English |
|
| 146 |
|--------|---------------------------|-------|---------|
|
| 147 |
| Token Error Rate | 29% | 56% | 19% |
|
| 148 |
| Character Error Rate (CER) | 32.89% | 60.71% | 20.98% |
|
| 149 |
| Word Error Rate (WER) | 46.53% | 86.16% | 31.13% |
|
|
|
|
| 150 |
|
| 151 |
**Key Observations:**
|
| 152 |
- The model shows strong performance on English (19% token error rate, 20.98% CER, 31.13% WER)
|
|
@@ -242,12 +253,14 @@ For transcription task, the model was trained on around 140 hours of Khmer audio
|
|
| 242 |
Khmer datasets include [`DDD-Cambodia/khm-asr-cultural`](https://huggingface.co/datasets/DDD-Cambodia/khm-asr-cultural) (134.6 hours), [`openslr/openslr`](https://huggingface.co/datasets/Kimang18/openslr-SLR42/blob/main/README.md), and [`google/fleurs`](https://huggingface.co/datasets/Kimang18/google-fleurs-km-kh).
|
| 243 |
Split `clean.100` of [`openslr/librispeech_asr`](https://huggingface.co/datasets/openslr/librispeech_asr) was used as English dataset.
|
| 244 |
|
|
|
|
| 245 |
| Dataset | Language | Training examples | Validation examples | Description |
|
| 246 |
| --------- | ---------- | ----------------- | ------------------- |- |
|
| 247 |
| **openslr/openslr** | Khmer | 2906 | 0 | Multi-speaker TTS data for Khmer language (split `SLR42`) |
|
| 248 |
| **google/fleurs** | Khmer | 1675 | 324 | TTS data for Khmer language (split `km_kh`) |
|
| 249 |
| **DDD-Cambodia/khm-asr-cultural** | Khmer | 56716 | 0 | Khmer ASR Cultural Dataset (split `train`) |
|
| 250 |
| **librispeech.clean** | English | 28539 | 2703 | Clean speech dataset for English transcription |
|
|
|
|
| 251 |
|
| 252 |
#### Translation Task
|
| 253 |
|
|
@@ -291,10 +304,10 @@ The training took around 10 hours.
|
|
| 291 |
[More Information Needed]
|
| 292 |
|
| 293 |
|
| 294 |
-
## Model Card
|
| 295 |
|
|
|
|
| 296 |
Name: KHUN Kimang (Ph.D.)
|
| 297 |
-
Email: kimang.khun@polytechnique.org
|
| 298 |
|
| 299 |
## Model Card Contact
|
| 300 |
|
|
|
|
| 89 |
|
| 90 |
<!-- This should link to a Dataset Card if possible. -->
|
| 91 |
|
| 92 |
+
<div align="center">
|
| 93 |
+
|
| 94 |
| Dataset | Language | Testing examples | Description |
|
| 95 |
| --------- | ---------- | ------------- | - |
|
| 96 |
| **google/fleurs** | Khmer | 765 | Multi-lingual dataset with Khmer language samples |
|
| 97 |
| **librispeech.clean** | English | 2620 | Clean speech dataset for English transcription |
|
| 98 |
+
</div>
|
| 99 |
|
| 100 |
**Note:** All evaluation results below are from the **test split** of each dataset. For `google/fleurs`, audios longer than `30 seconds` are excluded from the evaluation.
|
| 101 |
|
|
|
|
| 107 |
|
| 108 |
**Task:** Given audio input, detect the language.
|
| 109 |
|
| 110 |
+
<div align="center">
|
| 111 |
| Metric | Description |
|
| 112 |
|--------|-------------|
|
| 113 |
| **Precision** | Proportion of predicted languages that are correct |
|
| 114 |
| **Recall** | Proportion of actual language samples correctly identified |
|
| 115 |
| **Accuracy** | Proportion of total predictions that are correct |
|
| 116 |
| **F1-score** | Harmonic mean of precision and recall |
|
| 117 |
+
</div>
|
| 118 |
|
| 119 |
##### Transcription
|
| 120 |
|
| 121 |
**Task:** Convert audio to text (transcription).
|
| 122 |
|
| 123 |
+
<div align="center">
|
| 124 |
| Metric | Description |
|
| 125 |
|--------|-------------|
|
| 126 |
| **Token Error Rate** | Proportion of incorrectly transcribed tokens |
|
| 127 |
| **Character Error Rate (CER)** | Proportion of characters that are incorrect |
|
| 128 |
| **Word Error Rate (WER)** | Proportion of words that are incorrect |
|
| 129 |
+
</div>
|
| 130 |
|
| 131 |
**Note on Token Error Rate:** Token Error Rate measures model's capability in predicting the next token given the audio input and the current sequence of tokens. This metric is weaker than Word Error Rate (WER) and Character Error Rate (CER) because it doesn't account for insertions, deletions, and substitutions as comprehensively. Token Error Rate is used here because Khmer text lacks word boundaries, making WER and CER calculations challenging without additional preprocessing.
|
| 132 |
|
|
|
|
| 137 |
|
| 138 |
#### Language Detection Results
|
| 139 |
|
| 140 |
+
<div align="center">
|
| 141 |
| Dataset | Precision | Recall | Accuracy | F1-score |
|
| 142 |
|---------|-----------|--------|----------|----------|
|
| 143 |
| google/fleurs (Khmer) | 100% | 100% | 100% | 100% |
|
| 144 |
| librispeech.clean (English) | 100% | 100% | 100% | 100% |
|
| 145 |
+
</div>
|
| 146 |
|
| 147 |
**Key Finding:** Both model sizes achieved perfect language detection performance on both datasets, indicating excellent binary classification capability for distinguishing between Khmer and English audio.
|
| 148 |
|
|
|
|
| 151 |
|
| 152 |
#### Transcription Results
|
| 153 |
|
| 154 |
+
<div align="center">
|
| 155 |
| Metric | Combined (Khmer + English) | Khmer | English |
|
| 156 |
|--------|---------------------------|-------|---------|
|
| 157 |
| Token Error Rate | 29% | 56% | 19% |
|
| 158 |
| Character Error Rate (CER) | 32.89% | 60.71% | 20.98% |
|
| 159 |
| Word Error Rate (WER) | 46.53% | 86.16% | 31.13% |
|
| 160 |
+
</div>
|
| 161 |
|
| 162 |
**Key Observations:**
|
| 163 |
- The model shows strong performance on English (19% token error rate, 20.98% CER, 31.13% WER)
|
|
|
|
| 253 |
Khmer datasets include [`DDD-Cambodia/khm-asr-cultural`](https://huggingface.co/datasets/DDD-Cambodia/khm-asr-cultural) (134.6 hours), [`openslr/openslr`](https://huggingface.co/datasets/Kimang18/openslr-SLR42/blob/main/README.md), and [`google/fleurs`](https://huggingface.co/datasets/Kimang18/google-fleurs-km-kh).
|
| 254 |
Split `clean.100` of [`openslr/librispeech_asr`](https://huggingface.co/datasets/openslr/librispeech_asr) was used as English dataset.
|
| 255 |
|
| 256 |
+
<div align="center">
|
| 257 |
| Dataset | Language | Training examples | Validation examples | Description |
|
| 258 |
| --------- | ---------- | ----------------- | ------------------- |- |
|
| 259 |
| **openslr/openslr** | Khmer | 2906 | 0 | Multi-speaker TTS data for Khmer language (split `SLR42`) |
|
| 260 |
| **google/fleurs** | Khmer | 1675 | 324 | TTS data for Khmer language (split `km_kh`) |
|
| 261 |
| **DDD-Cambodia/khm-asr-cultural** | Khmer | 56716 | 0 | Khmer ASR Cultural Dataset (split `train`) |
|
| 262 |
| **librispeech.clean** | English | 28539 | 2703 | Clean speech dataset for English transcription |
|
| 263 |
+
</div>
|
| 264 |
|
| 265 |
#### Translation Task
|
| 266 |
|
|
|
|
| 304 |
[More Information Needed]
|
| 305 |
|
| 306 |
|
| 307 |
+
## Model Card Author
|
| 308 |
|
| 309 |
+
ឈ្មោះ: បណ្ឌិត ឃុន គីមអាង
|
| 310 |
Name: KHUN Kimang (Ph.D.)
|
|
|
|
| 311 |
|
| 312 |
## Model Card Contact
|
| 313 |
|