HuBERT Lao ASR

Fine-tuned HuBERT-Large model for Lao automatic speech recognition, achieving 25.37% CER on test data.

Model Details

This model is fine-tuned from facebook/hubert-large-ll60k using the SiangLao/lao-asr-thesis-dataset.

Training Configuration

Epochs: 15
Batch Size: 12
Learning Rate: 3e-5
Training Date: June 3, 2025
Vocabulary Size: 55 Lao characters + special tokens

Performance

Split	CER	Loss
Test	25.37%	0.652
Validation	25.16%	0.668

Usage

from transformers import HubertForCTC, Wav2Vec2Processor
import torch
import librosa

# Load model and processor
model = HubertForCTC.from_pretrained("SiangLao/hubert-lao-asr")
processor = Wav2Vec2Processor.from_pretrained("SiangLao/hubert-lao-asr")

# Load audio (must be 16kHz)
audio, sr = librosa.load("audio.wav", sr=16000)

# Generate prediction
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    predicted_ids = torch.argmax(logits, dim=-1)
    transcription = processor.batch_decode(predicted_ids)[0]

# Clean transcription
transcription = transcription.replace("<unk>", " ").strip()

print(transcription)

Citation

@thesis{naovalath2025lao,
  title={Lao Automatic Speech Recognition using Transfer Learning},
  author={Souphaxay Naovalath and Sounmy Chanthavong},
  advisor={Dr. Somsack Inthasone},
  school={National University of Laos, Faculty of Natural Sciences, Computer Science Department},
  year={2025}
}

Downloads last month: 12

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for SiangLao/hubert-lao-asr

Base model

facebook/hubert-large-ll60k

Finetuned

(22)

this model

SiangLao
/

hubert-lao-asr