--- license: cc-by-nc-4.0 language: - en - zh pipeline_tag: text-to-speech tags: - text-to-speech - zero-shot-tts - waveform-generation datasets: - amphion/Emilia-Dataset --- # WavTTS This repository hosts the official WavTTS checkpoint for zero-shot text-to-speech generation at 16 kHz. Please refer to the [GitHub repository](https://github.com/cwx-worst-one/WavTTS) and [paper](https://arxiv.org/abs/2606.03455) for more details. ## Files - `model_1200000.pt`: official WavTTS checkpoint. - `vocab.txt`: matching vocabulary file for the released checkpoint. ## Usage Please use this checkpoint with the WavTTS codebase: ```bash git clone https://github.com/cwx-worst-one/WavTTS cd WavTTS pip install -e . ``` Run inference with the default checkpoint: ```bash wavtts_infer-cli \ --model WavTTS \ --ref_audio "path/to/reference.wav" \ --ref_text "The transcription of the reference audio." \ --gen_text "The text you want to synthesize." ``` The default WavTTS configuration downloads `model_1200000.pt` from this repository automatically. To use the files explicitly, set: ```toml ckpt_file = "hf://worstchan/WavTTS/model_1200000.pt" vocab_file = "infer/examples/vocab.txt" ``` ## License The released model weights are licensed under CC BY-NC 4.0 due to the license restrictions of the Emilia training dataset. The WavTTS codebase is released under the MIT License. ## Citation If you find WavTTS useful, please cite the paper: ```bibtex @article{chen2026wavtts, title={WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling}, author={TODO}, journal={TODO}, year={2026} } ```