Revolab VITS — Multi-Speaker Bahasa Melayu TTS

VITS voice models trained on Revolab Malay speech datasets.

Speakers (2 production-quality)

Speaker ID	Name	Samples	CER	WER
sarah	sarah	27,792	0.0537	0.1835
paan	Paan	27,434	0.0681	0.1561

All speakers evaluated at CER < 10% (production quality).

Structure

speakers.json              # Speaker registry with eval metrics
speakers/
  <name>/
    model.onnx             # ONNX export for inference
    model.onnx.json        # Phoneme config

Performance (CPU)

Metric	Value
Avg latency	~54ms
Avg RTF	0.030
Speed	33.6x realtime

Training

All models trained with:

Architecture: VITS
Phonemizer: espeak-ng (ms voice)
Sample rate: 22050Hz
GPU: NVIDIA H200 NVL