worstchan
/

WavTTS

waveform-generation

Model card Files Files and versions

WavTTS / README.md

worstchan's picture

Update README.md

a3e921c verified 1 day ago

|

history blame contribute delete

1.63 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	- zh
	pipeline_tag: text-to-speech
	tags:
	- text-to-speech
	- zero-shot-tts
	- waveform-generation
	datasets:
	- amphion/Emilia-Dataset
	---

	# WavTTS

	This repository hosts the official WavTTS checkpoint for zero-shot text-to-speech generation at 16 kHz. Please refer to the [GitHub repository](https://github.com/cwx-worst-one/WavTTS) and [paper](https://arxiv.org/abs/2606.03455) for more details.

	## Files

	- `model_1200000.pt`: official WavTTS checkpoint.
	- `vocab.txt`: matching vocabulary file for the released checkpoint.

	## Usage

	Please use this checkpoint with the WavTTS codebase:

	```bash
	git clone https://github.com/cwx-worst-one/WavTTS
	cd WavTTS
	pip install -e .
	```

	Run inference with the default checkpoint:

	```bash
	wavtts_infer-cli \
	--model WavTTS \
	--ref_audio "path/to/reference.wav" \
	--ref_text "The transcription of the reference audio." \
	--gen_text "The text you want to synthesize."
	```

	The default WavTTS configuration downloads `model_1200000.pt` from this repository automatically. To use the files explicitly, set:

	```toml
	ckpt_file = "hf://worstchan/WavTTS/model_1200000.pt"
	vocab_file = "infer/examples/vocab.txt"
	```

	## License

	The released model weights are licensed under CC BY-NC 4.0 due to the license restrictions of the Emilia training dataset. The WavTTS codebase is released under the MIT License.

	## Citation

	If you find WavTTS useful, please cite the paper:

	```bibtex
	@article{chen2026wavtts,
	title={WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling},
	author={TODO},
	journal={TODO},
	year={2026}
	}
	```