🗣️ Whisper Large Lao Fine-tuned

Model Overview

This model is a fine-tuned version of OpenAI's Whisper Large for Lao (ພາສາລາວ) automatic speech recognition (ASR). It has been fine-tuned on the Phonepadith/laos-speech-dataset, a curated dataset containing Lao speech samples and transcriptions.

🧠 Model Details

Property	Description
Base model	`openai/whisper-large`
Fine-tuned by	@Phonepadith
Language	Lao (lo)
Task	Automatic Speech Recognition (ASR)
Framework	🤗 Transformers, PyTorch
Dataset	`Phonepadith/laos-speech-dataset`
Sampling rate	16 kHz
License	MIT (same as base model unless otherwise stated)

📊 Training Details

Fine-tuned on: Lao speech dataset with 7k+ samples
Input: 16kHz mono audio
Output: Lao text transcription
Epochs: 6
Batch size: 2
Learning rate: 1e-5
Optimizer: AdamW
Evaluation metric: Word Error Rate (WER)

🚀 Usage Example

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import torchaudio

# Load model and processor
model_id = "Phonepadith/whisper-large-lao-finetuned"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)

# Load an audio file (16kHz mono)
speech_array, sampling_rate = torchaudio.load("example.wav")
speech_array = torchaudio.functional.resample(speech_array, sampling_rate, 16000)

# Preprocess and generate transcription
input_features = processor(
    speech_array.squeeze().numpy(), 
    sampling_rate=16000, 
    return_tensors="pt"
).input_features

predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

print("Transcription:", transcription)

📈 Evaluation Results

Metric	Value
WER (validation)	coming soon
CER	coming soon

🧩 Intended Use

This model is designed for speech-to-text transcription in Lao, such as:

Voice command systems
Lao language learning apps
Accessibility tools (subtitles, transcripts)
Cultural and linguistic research

⚠️ Limitations

May struggle with code-switching (mix of Lao and English)
Background noise or strong dialectal accents may reduce accuracy
Whisper's built-in tokenizer may occasionally normalize Lao text (tone marks or spacing)

🪪 Citation

If you use this model in your research, please cite:

@misc{phonepadith2025whisperlao,
  title = {Whisper Large Fine-tuned for Lao ASR},
  author = {Phonepadith Phoummavong},
  year = {2025},
  howpublished = {\url{https://huggingface.co/Phonepadith/whisper-large-lao-finetuned}},
}

💬 Contact

For questions, collaboration, or dataset contributions:

📧 Email: phonepadithpp@gmail.com
🤗 Hugging Face Profile

Note: This model is part of ongoing efforts to improve ASR capabilities for low-resource languages like Lao. Contributions and feedback are welcome!

Downloads last month: 6

Safetensors

Model size

2B params

Tensor type

F32

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Phonepadith/whisper-large-lao-finetuned-v1

Base model

openai/whisper-large-v3

Finetuned

(676)

this model