A newer version of this model is available: openai/whisper-large

Whisper Large Hmong ASR

This repository provides an automatic speech recognition (ASR) model fine-tuned for the Hmong language using Whisper Large.

The model is available on Hugging Face and can be used immediately with the Transformers pipeline.

0. Model Description

This model is a fine-tuned version of Whisper Large for Hmong automatic speech recognition (ASR). It is trained on carefully prepared Hmong speech datasets ranging from short to medium-length utterances. The model is optimized for real-world conversational and general speech.

1. Quick Usage (Transformers Pipeline)

from transformers import pipeline

transcriber = pipeline(
    "automatic-speech-recognition",
    model="Pakorn2112/whisper-"
)

result = transcriber("hmong_sample.wav")
print(result["text"])

2. Dataset and Training Summary

Single Speech Model

Datasets: 533 audio files
Average audio duration: 1 to 20 seconds
Author: Pakorn Archakeeree, localvoice.org
Evaluation loss: 0.5448943376541138
Word error rate (WER): 12.2924
Maximum training steps: 10000

Multi Speech Model

Datasets: 1017 audio files
Average audio duration: 1 to 20 seconds
Author: Pakorn Archakeeree, localvoice.org
Evaluation loss: 0.6393303275108337
Word error rate (WER): 17.6957
Maximum training steps: 10000

3. Model Details

Base model: Whisper Large
Task: Automatic Speech Recognition (ASR)
Target language: Hmong
Framework: Hugging Face Transformers
Format: safetensors

4. Acknowledgements

This work is supported by:

HPC Ignite
ThaiSC
localvoice.org

Special thanks to the localvoice.org community for supporting Hmong language technology development.

5. License

This project is licensed under the Apache License 2.0. Please see the LICENSE file for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Pakorn2112/whisper-model-large-hmong

Base model

openai/whisper-large

Finetuned

(96)

this model