| --- |
| license: apache-2.0 |
| language: |
| - en |
| pipeline_tag: text-to-speech |
| --- |
| |
| <h1 align="center">DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Discrete Flow Matching</h1> |
|
|
| <p align="center"> |
| <a href="https://github.com/Fsoft-AIC/DiFlowTTS/tree/main"><img src="https://img.shields.io/badge/GitHub-Code-181717?logo=github" alt="GitHub"></a> |
| <a href="https://arxiv.org/abs/2509.09631"><img src="https://img.shields.io/badge/arXiv-2509.09631-b31b1b?logo=arxiv" alt="Paper"></a> |
| <a href="https://fsoft-aic.github.io/SonNN45-Demo/projects/diflowtts/"><img src="https://img.shields.io/badge/Demo-Page-blue?logo=googlechrome&logoColor=white" alt="Demo"></a> |
| <a href="https://interspeech2026.org/en-AU"><img src="https://img.shields.io/badge/Interspeech-2026-orange" alt="Interspeech 2026"></a> |
| </p> |
|
|
| > [!NOTE] |
| > DiFlow-TTS is trained on 470 hours of the LibriTTS dataset, which consists of **predominantly neutral speech**. As a result, **it may not perform well on prompts with strong emotional expression**. |
|
|
| Download [DiFlow-TTS](https://huggingface.co/Fsoft-AIC/DiFlowTTS/blob/main/diflow-tts.ckpt) checkpoint, and place it as follows: |
|
|
| ``` |
| root/ |
| βββ ckpts/ |
| βββ diflow-tts.ckpt |
| ``` |