πŸ‡°πŸ‡­ Khmer Speech Recognition β€” Wav2Vec2 XLS-R (50 h)

Fine-tuned facebook/wav2vec2-xls-r-300m on the first 50 hours of a Khmer speech corpus using CTC decoding.


Model details

Property Value
Base model facebook/wav2vec2-xls-r-300m
Language Khmer / αž—αžΆαžŸαžΆαžαŸ’αž˜αŸ‚αžš (km)
Task Automatic Speech Recognition (ASR)
Training data First 50 hours of Khmer audio
Input sample rate 16 kHz, mono
Architecture Wav2Vec2 + CTC head
Framework πŸ€— Transformers

How to use

from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
import soundfile as sf
import torch

# Load model
processor = Wav2Vec2Processor.from_pretrained("Vatho/wav2vec2-khmer-xls-r-50h")
model     = Wav2Vec2ForCTC.from_pretrained("Vatho/wav2vec2-khmer-xls-r-50h")
model.eval()

# Load audio (must be 16 kHz mono WAV)
audio, sr = sf.read("your_audio.wav")

# Transcribe
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits

predicted_ids = torch.argmax(logits, dim=-1)
print(processor.batch_decode(predicted_ids)[0])

Text normalisation

Only characters in the Khmer Unicode block (U+1780–U+17FF) are kept. All punctuation, Latin characters, and extra whitespace are stripped before scoring.


Limitations

  • Trained on 50 h of data β€” performance may degrade on out-of-domain or noisy speech.
  • No language model rescoring is applied at decode time.
  • Best results on clean, 16 kHz recordings.

Training details

Setting Value
Optimizer AdamW
Base learning rate 3e-4
Batch size 16
Max steps 20,000
Warmup steps 2,000
CTC loss βœ“
Early stopping βœ“

Citation

If you use this model, please cite the base model:

@article{babu2021xls,
  title     = {XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale},
  author    = {Babu, Arun and Wang, Changhan and Tjandra, Andros and others},
  journal   = {arXiv preprint arXiv:2111.09296},
  year      = {2021}
}
Downloads last month
51
Safetensors
Model size
0.3B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Vatho/Khmer-ocm

Finetuned
(882)
this model

Space using Vatho/Khmer-ocm 1

Paper for Vatho/Khmer-ocm