ttsds
/

e2-tts

speech-synthesis

Model card Files Files and versions

e2-tts / README.md

cdminix's picture

Add e2-tts weights

adc086e verified 9 days ago

|

history blame contribute delete

2.66 kB

	---
	# Generated at 2026-01-29T18:15:41Z from templates/weights/README.md.j2
	license: cc-by-nc-4.0
	language:
	- eng
	- zho
	tags:
	- tts
	- text-to-speech
	- speech-synthesis
	- voice-cloning
	library_name: ttsdb
	pipeline_tag: text-to-speech

	---

	# E2 TTS

	> This is a mirror of the original weights for use with [TTSDB](https://github.com/ttsds/ttsdb).
	>
	> Original weights: [https://huggingface.co/SWivid/E2-TTS](https://huggingface.co/SWivid/E2-TTS)
	> Original code: [https://github.com/SWivid/F5-TTS](https://github.com/SWivid/F5-TTS)


	A non-autoregressive masked U-Net transformer text-to-speech model.



	## Original Work

	This model was created by the original authors. Please cite their work if you use this model:


	```bibtex
	@inproceedings{e2-tts,
	title={{E2 TTS}: Embarrassingly easy fully non-autoregressive zero-shot tts},
	author={Eskimez, Sefik Emre and Wang, Xiaofei and Thakker, Manthan and Li, Canrun and Tsai, Chung-Hsien and Xiao, Zhen and Yang, Hemin and Zhu, Zirun and Tang, Min and Tan, Xu and others},
	booktitle={2024 IEEE Spoken Language Technology Workshop (SLT)},
	pages={682--689},
	year={2024},
	organization={IEEE}
	}
	```



	Papers:

	- https://ieeexplore.ieee.org/abstract/document/10832320



	## Installation

	```bash
	pip install ttsdb-e2-tts
	```

	## Usage

	```python
	from ttsdb_e2_tts import E2TTS

	# Load the model (downloads weights automatically)
	model = E2TTS(model_id="ttsds/E2 TTS")

	# Synthesize speech
	audio, sample_rate = model.synthesize(
	text="Hello, this is a test of E2 TTS.",
	reference_audio="path/to/reference.wav",
	text_reference="Transcript of the reference audio.",
	language="en",
	)

	# Save the output
	model.save_audio(audio, sample_rate, "output.wav")
	```

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Sample Rate \| 24000 Hz \|
	\| Parameters \| 335M \|
	\| Architecture \| Non-Autoregressive, Masked, Flow Matching, U-Net Transformer \|
	\| Languages \| English, Chinese \|
	\| Release Date \| 2024-10-30 \|


	### Training Data


	- [Emilia Dataset](https://huggingface.co/datasets/amphion/Emilia-Dataset) (100000 hours)




	## License

	- Weights: Creative Commons Attribution-NonCommercial 4.0
	- Code: MIT License

	Please refer to the original repositories for full license terms.

	## Links

	- Original Code: [https://github.com/SWivid/F5-TTS](https://github.com/SWivid/F5-TTS)
	- Original Weights: [https://huggingface.co/SWivid/E2-TTS](https://huggingface.co/SWivid/E2-TTS)
	- TTSDB Package: [ttsdb-e2-tts](https://pypi.org/project/ttsdb-e2-tts/)
	- TTSDB GitHub: [https://github.com/ttsds/ttsdb](https://github.com/ttsds/ttsdb)