Instructions to use khmerttsopensource/khmer-tts with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use khmerttsopensource/khmer-tts with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-audio", model="khmerttsopensource/khmer-tts")# Load model directly from transformers import AutoTokenizer, AutoModelForPreTraining tokenizer = AutoTokenizer.from_pretrained("khmerttsopensource/khmer-tts") model = AutoModelForPreTraining.from_pretrained("khmerttsopensource/khmer-tts") - Notebooks
- Google Colab
- Kaggle
# Load model directly
from transformers import AutoTokenizer, AutoModelForPreTraining
tokenizer = AutoTokenizer.from_pretrained("khmerttsopensource/khmer-tts")
model = AutoModelForPreTraining.from_pretrained("khmerttsopensource/khmer-tts")Khmer TTS
This repository contains a Khmer text-to-speech model fine-tuned from facebook/mms-tts-khm.
The model is packaged in Hugging Face Transformers format and can be loaded with VitsModel and AutoTokenizer.
Files
model.safetensors- fine-tuned VITS model weights.config.json,vocab.json, tokenizer files - model and tokenizer configuration.examples/inference.py- minimal local inference script.eval/benchmark/- generated benchmark samples, review sheet, manifest, and timing summary.training/- training configuration and local wrapper used for this experiment.
Raw training audio is not included in this release directory.
Usage
pip install -r requirements.txt
python examples/inference.py --text "សួស្តីអ្នកទាំងអស់គ្នា" --output khmer_tts.wav
Or load the model directly:
import torch
from scipy.io.wavfile import write
from transformers import AutoTokenizer, VitsModel
repo_id = "khmerttsopensource/khmer-tts"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = VitsModel.from_pretrained(repo_id)
text = "សួស្តីអ្នកទាំងអស់គ្នា"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
waveform = model(**inputs).waveform.squeeze().cpu().numpy()
write("khmer_tts.wav", rate=model.config.sampling_rate, data=waveform)
Evaluation
The included benchmark generated 50 samples.
| Metric | Value |
|---|---|
| Success count | 50 |
| Failure count | 0 |
| Failure rate | 0.0 |
| Mean generation time | 0.434978 seconds |
| Mean audio duration | 3.27936 seconds |
| Mean RTF | 0.136449 |
| Min RTF | 0.026531 |
| Max RTF | 0.289309 |
See eval/benchmark/review_sheet.csv for manual review fields and eval/benchmark/generated/ for generated WAV samples.
Training Summary
- Base model:
facebook/mms-tts-khm - Epochs:
2 - Batch size:
2 - Sample rate:
16000 - Training seed:
987
Limitations
This is an experimental single-speaker Khmer TTS model. Review pronunciation, naturalness, and text fidelity before production use. The benchmark samples are generated examples, not a full safety or quality evaluation.
License
This release uses cc-by-nc-4.0, matching the non-commercial license of the base MMS Khmer TTS model. Confirm that any downstream use complies with the base model license and the rights for the fine-tuning data.
Citation
If you use this model, cite the MMS work:
@article{pratap2023mms,
title={Scaling Speech Technology to 1,000+ Languages},
author={Pratap, Vineel and Tjandra, Andros and Shi, Bowen and Tomasello, Paden and Babu, Arun and Kundu, Sayani and Elkahky, Ali and Ni, Zhaoheng and Vyas, Apoorv and Fazel-Zarandi, Maryam and Adi, Yossi and Zhang, Xiaohui and Hsu, Wei-Ning and Conneau, Alexis and Auli, Michael},
journal={arXiv},
year={2023}
}
- Downloads last month
- 226
Model tree for khmerttsopensource/khmer-tts
Base model
facebook/mms-tts-khm
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-to-audio", model="khmerttsopensource/khmer-tts")