| --- |
| library_name: mlx |
| license: apache-2.0 |
| language: |
| - km |
| pipeline_tag: automatic-speech-recognition |
| datasets: |
| - seanghay/km-speech-corpus |
| - seanghay/khmer_mwpt_speech |
| tags: |
| - Khmer |
| - mlx |
| base_model: openai-whisper-tiny |
| model-index: |
| - name: whisper-tiny-khmer-mlx-fp16 by Kimang KHUN |
| results: |
| - task: |
| type: automatic-speech-recognition |
| name: Speech Recognition |
| dataset: |
| name: test split of "km_kh" in google/fleurs |
| type: google/fleurs |
| metrics: |
| - type: wer |
| value: 80.2% |
| name: test |
| - task: |
| type: automatic-speech-recognition |
| name: Speech Recognition |
| dataset: |
| name: train split of "SLR42" in openslr/openslr |
| type: openslr/openslr |
| metrics: |
| - type: wer |
| value: 63.2% |
| name: test |
| --- |
| |
| # whisper-tiny-khmer-mlx-fp16 |
| This model was converted to MLX format from [`openai-whisper-tiny`](https://github.com/openai/whisper), then fine-tined to Khmer language using two datasets: |
| - [seanghay/khmer_mpwt_speech](https://huggingface.com/datasets/seanghay/khmer_mpwt_speech) |
| - [seanghay/km-speech-corpus](https://huggingface.com/datasets/seanghay/km-speech-corpus) |
|
|
| It achieves the following __word error rate__ (`wer`) on 2 popular datasets: |
| - 80.2% on `test` split of [google/fleurs](https://huggingface.co/datasets/google/fleurs) `km-kh` |
| - 63.2% on `train` split of [openslr/openslr](https://huggingface.co/datasets/openslr/openslr) `SLR42` |
|
|
| __NOTE__ MLX format is usable for M-chip series of Apple. |
|
|
| ## Use with mlx |
| ```bash |
| pip install mlx-whisper |
| ``` |
|
|
| Write a python script, `example.py`, as the following |
| ```python |
| import mlx_whisper |
| |
| result = mlx_whisper.transcribe( |
| SPEECH_FILE_NAME, |
| path_or_hf_repo="mlx-community/whisper-tiny-khmer-mlx-fp16", |
| fp16=True |
| ) |
| print(result['text']) |
| ``` |
| Then execute this script `example.py` to see the result. |
|
|
| You can also use command line in terminal |
| ```bash |
| mlx_whisper --model mlx-community/whisper-tiny-khmer-mlx-fp16 --task transcribe SPEECH_FILE_NAME --fp16 True |
| ``` |
|
|