WavTTS

This repository hosts the official WavTTS checkpoint for zero-shot text-to-speech generation at 16 kHz. Please refer to the GitHub repository and paper for more details.

Files

model_1200000.pt: official WavTTS checkpoint.
vocab.txt: matching vocabulary file for the released checkpoint.

Usage

Please use this checkpoint with the WavTTS codebase:

git clone https://github.com/cwx-worst-one/WavTTS
cd WavTTS
pip install -e .

Run inference with the default checkpoint:

wavtts_infer-cli \
  --model WavTTS \
  --ref_audio "path/to/reference.wav" \
  --ref_text "The transcription of the reference audio." \
  --gen_text "The text you want to synthesize."

The default WavTTS configuration downloads model_1200000.pt from this repository automatically. To use the files explicitly, set:

ckpt_file = "hf://worstchan/WavTTS/model_1200000.pt"
vocab_file = "infer/examples/vocab.txt"

License

The released model weights are licensed under CC BY-NC 4.0 due to the license restrictions of the Emilia training dataset. The WavTTS codebase is released under the MIT License.

Citation

If you find WavTTS useful, please cite the paper:

@article{chen2026wavtts,
  title={WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling},
  author={Chen, Wenxi and Jia, Dongya and Chen, Yushen and Niu, Zhikang and Liang, Yuzhe and Li, Xiquan and Yan, Ruiqi and Ma, Ziyang and Yang, Guanrou and Chen, Sanyuan and others},
  journal={arXiv preprint arXiv:2606.03455},
  year={2026}
}

Downloads last month: 26

Dataset used to train worstchan/WavTTS

Space using worstchan/WavTTS 1

Paper for worstchan/WavTTS

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

Paper • 2606.03455 • Published Jun 2