f5-tts / README.md

Add f5-tts weights

29ffc06 verified about 1 month ago

4.42 kB

	---
	# Generated at 2026-02-01T17:14:32Z from templates/weights/README.md.j2
	license: cc-by-nc-4.0
	language:
	- eng
	- zho
	tags:
	- tts
	- text-to-speech
	- speech-synthesis
	- voice-cloning
	library_name: ttsdb
	pipeline_tag: text-to-speech
	base_model:
	- SWivid/F5-TTS

	---

	# F5-TTS

	> This is a mirror of the original weights for use with [TTSDB](https://github.com/ttsds/ttsdb).
	>
	> Original weights: [https://huggingface.co/SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS)
	> Original code: [https://github.com/SWivid/F5-TTS](https://github.com/SWivid/F5-TTS)


	Non-Autoregressive Flow Matching (DiT) text-to-speech model by [Yushen Chen](https://github.com/SWivid).



	## Original Work

	This model was created by the original authors. Please cite their work if you use this model:


	```bibtex
	@inproceedings{f5-tts,
	title = "F5-{TTS}: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching",
	author = "Chen, Yushen and
	Niu, Zhikang and
	Ma, Ziyang and
	Deng, Keqi and
	Wang, Chunhui and
	JianZhao, JianZhao and
	Yu, Kai and
	Chen, Xie",
	editor = "Che, Wanxiang and
	Nabende, Joyce and
	Shutova, Ekaterina and
	Pilehvar, Mohammad Taher",
	booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
	month = jul,
	year = "2025",
	address = "Vienna, Austria",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.acl-long.313/",
	doi = "10.18653/v1/2025.acl-long.313",
	pages = "6255--6271",
	ISBN = "979-8-89176-251-0",
	abstract = "This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as duration model, text encoder, and phoneme alignment, the text input is simply padded with filler tokens to the same length as input speech, and then the denoising is performed for speech generation, which was originally proved feasible by E2 TTS. However, the original design of E2 TTS makes it hard to follow due to its slow convergence and low robustness. To address these issues, we first model the input with ConvNeXt to refine the text representation, making it easy to align with the speech. We further propose an inference-time Sway Sampling strategy, which significantly improves our model{'}s performance and efficiency. This sampling strategy for flow step can be easily applied to existing flow matching based models without retraining. Our design allows faster training and achieves an inference RTF of 0.15, which is greatly improved compared to state-of-the-art diffusion-based TTS models. Trained on a public 100K hours multilingual dataset, our F5-TTS exhibits highly natural and expressive zero-shot ability, seamless code-switching capability, and speed control efficiency. We have released all codes and checkpoints to promote community development, at https://SWivid.github.io/F5-TTS/."
	}
	```



	Papers:

	- https://aclanthology.org/2025.acl-long.313/



	## Installation

	```bash
	pip install ttsdb-f5-tts
	```

	## Usage

	```python
	from ttsdb_f5_tts import F5TTS

	# Load the model (downloads weights automatically)
	model = F5TTS(model_id="ttsds/f5-tts")

	# Synthesize speech
	audio, sample_rate = model.synthesize(
	text="Hello, this is a test of F5-TTS.",
	reference_audio="path/to/reference.wav",
	)

	# Save the output
	model.save_audio(audio, sample_rate, "output.wav")
	```

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Sample Rate \| 24000 Hz \|
	\| Parameters \| 335M \|
	\| Architecture \| Non-Autoregressive, Flow Matching, Diffusion Transformer \|
	\| Languages \| English, Chinese \|
	\| Release Date \| 2024-10-30 \|


	### Training Data


	- [Emilia Dataset](https://huggingface.co/datasets/amphion/Emilia-Dataset) (100000 hours)




	## License

	- Weights: Creative Commons Attribution-NonCommercial 4.0
	- Code: MIT License

	Please refer to the original repositories for full license terms.

	## Links

	- Original Code: [https://github.com/SWivid/F5-TTS](https://github.com/SWivid/F5-TTS)
	- Original Weights: [https://huggingface.co/SWivid/F5-TTS](https://huggingface.co/SWivid/F5-TTS)
	- TTSDB Package: [ttsdb-f5-tts](https://pypi.org/project/ttsdb-f5-tts/)
	- TTSDB GitHub: [https://github.com/ttsds/ttsdb](https://github.com/ttsds/ttsdb)