EmoVoice / README.md

Improve model card: add pipeline tag, paper/project links, usage, and correct license

2aa2cac verified 5 months ago

2.52 kB

	---
	license: cc-by-nc-4.0
	pipeline_tag: text-to-speech
	tags:
	- tts
	- speech-synthesis
	- emotion-control
	---

	# EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

	EmoVoice is a novel emotion-controllable Text-to-Speech (TTS) model that exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control, and a phoneme boost variant design to enhance content consistency.

	This model was presented in the paper: [EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting](https://huggingface.co/papers/2504.12867).

	For more details, check out the [project page](https://yanghaha0908.github.io/EmoVoice/) and the [GitHub repository](https://github.com/yanghaha0908/EmoVoice).

	## Installation

	### Create a separate environment if needed

	```bash
	conda create -n EmoVoice python=3.10
	conda activate EmoVoice
	pip install -r requirements.txt
	```
	## Usage

	### Decode with checkpoints
	```bash
	bash examples/tts/scripts/inference_EmoVoice.sh
	bash examples/tts/scripts/inference_EmoVoice-PP.sh
	bash examples/tts/scripts/inference_EmoVoice_1.5B.sh
	```
	## Train from scratch
	```bash
	# Fisrt Stage: Pretrain TTS
	bash examples/tts/scripts/pretrain_EmoVoice.sh
	bash examples/tts/scripts/pretrain_EmoVoice-PP.sh
	bash examples/tts/scripts/pretrain_EmoVoice_1.5B.sh

	# Second Stage: Finetune Emotional TTS
	bash examples/tts/scripts/ft_EmoVoice.sh
	bash examples/tts/scripts/ft_EmoVoice-PP.sh
	bash examples/tts/scripts/ft_EmoVoice_1.5B.sh
	```

	## Checkpoints
	- Checkpoints can be found on Hugging Face: https://huggingface.co/yhaha/EmoVoice

	## Dataset

	- Pretrain TTS: [VoiceAssistant](https://huggingface.co/datasets/worstchan/VoiceAssistant-400K-SLAM-Omni)
	- Finetune Emotional TTS: [EmoVoice-DB](https://huggingface.co/datasets/yhaha/EmoVoice-DB) and part of [laions_got_talent](https://huggingface.co/datasets/laion/laions_got_talent)

	## Acknowledgements
	- Our codes is built on [SLAM-LLM](https://github.com/X-LANCE/SLAM-LLM)
	- [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) valuable repo

	## Citation

	If our work and codebase is useful for you, please cite as:
	```
	@article{yang2025emovoice,
	title={EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting},
	author={Yang, Guanrou and Yang, Chen and Chen, Qian and Ma, Ziyang and Chen, Wenxi and Wang, Wen and Wang, Tianrui and Yang, Yifan and Niu, Zhikang and Liu, Wenrui and others},
	journal={arXiv preprint arXiv:2504.12867},
	year={2025}
	}
	```