File size: 1,632 Bytes
66dbd92 2df3d51 66dbd92 2df3d51 a3e921c 2df3d51 c6f1962 2df3d51 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ---
license: cc-by-nc-4.0
language:
- en
- zh
pipeline_tag: text-to-speech
tags:
- text-to-speech
- zero-shot-tts
- waveform-generation
datasets:
- amphion/Emilia-Dataset
---
# WavTTS
This repository hosts the official WavTTS checkpoint for zero-shot text-to-speech generation at 16 kHz. Please refer to the [GitHub repository](https://github.com/cwx-worst-one/WavTTS) and [paper](https://arxiv.org/abs/2606.03455) for more details.
## Files
- `model_1200000.pt`: official WavTTS checkpoint.
- `vocab.txt`: matching vocabulary file for the released checkpoint.
## Usage
Please use this checkpoint with the WavTTS codebase:
```bash
git clone https://github.com/cwx-worst-one/WavTTS
cd WavTTS
pip install -e .
```
Run inference with the default checkpoint:
```bash
wavtts_infer-cli \
--model WavTTS \
--ref_audio "path/to/reference.wav" \
--ref_text "The transcription of the reference audio." \
--gen_text "The text you want to synthesize."
```
The default WavTTS configuration downloads `model_1200000.pt` from this repository automatically. To use the files explicitly, set:
```toml
ckpt_file = "hf://worstchan/WavTTS/model_1200000.pt"
vocab_file = "infer/examples/vocab.txt"
```
## License
The released model weights are licensed under CC BY-NC 4.0 due to the license restrictions of the Emilia training dataset. The WavTTS codebase is released under the MIT License.
## Citation
If you find WavTTS useful, please cite the paper:
```bibtex
@article{chen2026wavtts,
title={WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling},
author={TODO},
journal={TODO},
year={2026}
}
``` |