vits / README.md
khursanirevo's picture
feat: mirror Revolab/vits production data (sarah, paan)
15d2fee verified
|
Raw
History Blame Contribute Delete
869 Bytes

Revolab VITS — Multi-Speaker Bahasa Melayu TTS

VITS voice models trained on Revolab Malay speech datasets.

Speakers (2 production-quality)

Speaker ID Name Samples CER WER
sarah sarah 27,792 0.0537 0.1835
paan Paan 27,434 0.0681 0.1561

All speakers evaluated at CER < 10% (production quality).

Structure

speakers.json              # Speaker registry with eval metrics
speakers/
  <name>/
    model.onnx             # ONNX export for inference
    model.onnx.json        # Phoneme config

Performance (CPU)

Metric Value
Avg latency ~54ms
Avg RTF 0.030
Speed 33.6x realtime

Training

All models trained with:

  • Architecture: VITS
  • Phonemizer: espeak-ng (ms voice)
  • Sample rate: 22050Hz
  • GPU: NVIDIA H200 NVL