| | --- |
| | datasets: |
| | - k2-fsa/OpenDialog |
| | - amphion/Emilia-Dataset |
| | language: |
| | - en |
| | - zh |
| | license: apache-2.0 |
| | pipeline_tag: text-to-speech |
| | tags: |
| | - text-to-speech |
| | --- |
| | |
| | # ZipVoice⚡: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching</center> |
| |
|
| | This model is a checkpoint for **ZipVoice-Dialog**, a non-autoregressive zero-shot spoken dialogue generation model, as presented in [ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching](https://huggingface.co/papers/2507.09318). |
| |
|
| | You can also find the project/demo page here: [https://zipvoice-dialog.github.io](https://zipvoice-dialog.github.io) |
| |
|
| | ## 1. Explanation of each directory |
| |
|
| | | Directory | Model Type | Training Data | Initialized from | |
| | | :---------------------------- | :-----------------------: | :-------------------------------: | :------------------------: | |
| | | zipvoice | ZipVoice | Emilia | - | |
| | | zipvoice_libritts | ZipVoice | LibriTTS | - | |
| | | zipvoice_distill | ZipVoice-Distill | Emilia | zipvoice/model.pt | |
| | | zipvoice_distill_libritts | ZipVoice-Distill | LibriTTS | zipvoice_libritts/model.pt | |
| | | zipvoice_dialog | ZipVoice-Dialog | OpenDialog + in-house dataset | zipvoice/model.pt | |
| | | zipvoice_dialog_opendialog | ZipVoice-Dialog | OpenDialog | zipvoice/model.pt | |
| | | zipvoice_dialog_stereo | ZipVoice-Dialog-Stereo | in-house dataset | zipvoice_dialog/model.pt | |
| | |
| | ## 2. Github |
| | |
| | See our Github repository [ZipVoice](https://github.com/k2-fsa/ZipVoice) for details |
| | |
| | |
| | ## 3. Discussion & Communication |
| | |
| | You can directly discuss on [Github Issues](https://github.com/k2-fsa/ZipVoice/issues). |
| | |
| | You can also scan the QR code to join our wechat group or follow our wechat official account. |
| | |
| | | Wechat Group | Wechat Official Account | |
| | | ------------ | ----------------------- | |
| | | | | |
| | |
| | ## 4. Citation |
| | |
| | ```bibtex |
| | @article{zhu2025zipvoice, |
| | title={ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching}, |
| | author={Zhu, Han and Kang, Wei and Yao, Zengwei and Guo, Liyong and Kuang, Fangjun and Li, Zhaoqing and Zhuang, Weiji and Lin, Long and Povey, Daniel}, |
| | journal={arXiv preprint arXiv:2506.13053}, |
| | year={2025} |
| | } |
| | |
| | @article{zhu2025zipvoicedialog, |
| | title={ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching}, |
| | author={Zhu, Han and Kang, Wei and Guo, Liyong and Yao, Zengwei and Kuang, Fangjun and Zhuang, Weiji and Li, Zhaoqing and Han, Zhifeng and Zhang, Dong and Zhang, Xin and Song, Xingchen and Lin, Long and Povey, Daniel}, |
| | journal={arXiv preprint arXiv:2507.09318}, |
| | year={2025} |
| | } |
| | ``` |