e2-tts / README.md
cdminix's picture
Add e2-tts weights
adc086e verified
metadata
license: cc-by-nc-4.0
language:
  - eng
  - zho
tags:
  - tts
  - text-to-speech
  - speech-synthesis
  - voice-cloning
library_name: ttsdb
pipeline_tag: text-to-speech

E2 TTS

This is a mirror of the original weights for use with TTSDB.

Original weights: https://huggingface.co/SWivid/E2-TTS Original code: https://github.com/SWivid/F5-TTS

A non-autoregressive masked U-Net transformer text-to-speech model.

Original Work

This model was created by the original authors. Please cite their work if you use this model:

@inproceedings{e2-tts,
  title={{E2 TTS}: Embarrassingly easy fully non-autoregressive zero-shot tts},
  author={Eskimez, Sefik Emre and Wang, Xiaofei and Thakker, Manthan and Li, Canrun and Tsai, Chung-Hsien and Xiao, Zhen and Yang, Hemin and Zhu, Zirun and Tang, Min and Tan, Xu and others},
  booktitle={2024 IEEE Spoken Language Technology Workshop (SLT)},
  pages={682--689},
  year={2024},
  organization={IEEE}
}

Papers:

Installation

pip install ttsdb-e2-tts

Usage

from ttsdb_e2_tts import E2TTS

# Load the model (downloads weights automatically)
model = E2TTS(model_id="ttsds/E2 TTS")

# Synthesize speech
audio, sample_rate = model.synthesize(
    text="Hello, this is a test of E2 TTS.",
    reference_audio="path/to/reference.wav",
    text_reference="Transcript of the reference audio.",
    language="en",
)

# Save the output
model.save_audio(audio, sample_rate, "output.wav")

Model Details

Property Value
Sample Rate 24000 Hz
Parameters 335M
Architecture Non-Autoregressive, Masked, Flow Matching, U-Net Transformer
Languages English, Chinese
Release Date 2024-10-30

Training Data

License

  • Weights: Creative Commons Attribution-NonCommercial 4.0
  • Code: MIT License

Please refer to the original repositories for full license terms.

Links