E2 TTS
This is a mirror of the original weights for use with TTSDB.
Original weights: https://huggingface.co/SWivid/E2-TTS Original code: https://github.com/SWivid/F5-TTS
A non-autoregressive masked U-Net transformer text-to-speech model.
Original Work
This model was created by the original authors. Please cite their work if you use this model:
@inproceedings{e2-tts,
title={{E2 TTS}: Embarrassingly easy fully non-autoregressive zero-shot tts},
author={Eskimez, Sefik Emre and Wang, Xiaofei and Thakker, Manthan and Li, Canrun and Tsai, Chung-Hsien and Xiao, Zhen and Yang, Hemin and Zhu, Zirun and Tang, Min and Tan, Xu and others},
booktitle={2024 IEEE Spoken Language Technology Workshop (SLT)},
pages={682--689},
year={2024},
organization={IEEE}
}
Papers:
Installation
pip install ttsdb-e2-tts
Usage
from ttsdb_e2_tts import E2TTS
# Load the model (downloads weights automatically)
model = E2TTS(model_id="ttsds/E2 TTS")
# Synthesize speech
audio, sample_rate = model.synthesize(
text="Hello, this is a test of E2 TTS.",
reference_audio="path/to/reference.wav",
text_reference="Transcript of the reference audio.",
language="en",
)
# Save the output
model.save_audio(audio, sample_rate, "output.wav")
Model Details
| Property | Value |
|---|---|
| Sample Rate | 24000 Hz |
| Parameters | 335M |
| Architecture | Non-Autoregressive, Masked, Flow Matching, U-Net Transformer |
| Languages | English, Chinese |
| Release Date | 2024-10-30 |
Training Data
- Emilia Dataset (100000 hours)
License
- Weights: Creative Commons Attribution-NonCommercial 4.0
- Code: MIT License
Please refer to the original repositories for full license terms.
Links
- Original Code: https://github.com/SWivid/F5-TTS
- Original Weights: https://huggingface.co/SWivid/E2-TTS
- TTSDB Package: ttsdb-e2-tts
- TTSDB GitHub: https://github.com/ttsds/ttsdb