e2-tts / README.md
cdminix's picture
Update e2-tts space
10bd3d5 verified
metadata
title: E2 TTS
emoji: 🎤
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: cc-by-nc-4.0

E2 TTS Text-to-Speech

A non-autoregressive masked U-Net transformer text-to-speech model.

Features

  • Zero-shot voice cloning
  • Multiple language support: English, Chinese
  • High-quality 24kHz audio output

Usage

  1. Upload a reference audio clip (3-10 seconds recommended)
  2. Enter the transcript of the reference audio
  3. Enter the text you want to synthesize
  4. Select the language
  5. Click "Synthesize"

Model Information

  • Architecture: Non-Autoregressive, Masked, Flow Matching, U-Net Transformer
  • Sample Rate: 24000 Hz
  • Parameters: 335M

Citation

@inproceedings{e2-tts,
  title={{E2 TTS}: Embarrassingly easy fully non-autoregressive zero-shot tts},
  author={Eskimez, Sefik Emre and Wang, Xiaofei and Thakker, Manthan and Li, Canrun and Tsai, Chung-Hsien and Xiao, Zhen and Yang, Hemin and Zhu, Zirun and Tang, Min and Tan, Xu and others},
  booktitle={2024 IEEE Spoken Language Technology Workshop (SLT)},
  pages={682--689},
  year={2024},
  organization={IEEE}
}

Links