A newer version of this model is available:
openai/whisper-large
Whisper Large Hmong ASR
This repository provides an automatic speech recognition (ASR) model fine-tuned for the Hmong language using Whisper Large.
The model is available on Hugging Face and can be used immediately with the Transformers pipeline.
0. Model Description
This model is a fine-tuned version of Whisper Large for Hmong automatic speech recognition (ASR). It is trained on carefully prepared Hmong speech datasets ranging from short to medium-length utterances. The model is optimized for real-world conversational and general speech.
1. Quick Usage (Transformers Pipeline)
from transformers import pipeline
transcriber = pipeline(
"automatic-speech-recognition",
model="Pakorn2112/whisper-"
)
result = transcriber("hmong_sample.wav")
print(result["text"])
2. Dataset and Training Summary
Single Speech Model
- Datasets: 533 audio files
- Average audio duration: 1 to 20 seconds
- Author: Pakorn Archakeeree, localvoice.org
- Evaluation loss: 0.5448943376541138
- Word error rate (WER): 12.2924
- Maximum training steps: 10000
Multi Speech Model
- Datasets: 1017 audio files
- Average audio duration: 1 to 20 seconds
- Author: Pakorn Archakeeree, localvoice.org
- Evaluation loss: 0.6393303275108337
- Word error rate (WER): 17.6957
- Maximum training steps: 10000
3. Model Details
- Base model: Whisper Large
- Task: Automatic Speech Recognition (ASR)
- Target language: Hmong
- Framework: Hugging Face Transformers
- Format: safetensors
4. Acknowledgements
This work is supported by:
- HPC Ignite
- ThaiSC
- localvoice.org
Special thanks to the localvoice.org community for supporting Hmong language technology development.
5. License
This project is licensed under the Apache License 2.0. Please see the LICENSE file for details.
Model tree for Pakorn2112/whisper-model-large-hmong
Base model
openai/whisper-large