--- license: mit ---

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

[![Paper](https://img.shields.io/badge/Paper-arXiv-b31b1b?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2504.12867) [![Code](https://img.shields.io/badge/Code-Repo-black?style=flat&logo=github&logoColor=white)](https://github.com/yanghaha0908/EmoVoice?tab=readme-ov-file) [![Hugging Face Space](https://img.shields.io/badge/Space-HuggingFace-orange?logo=huggingface)](https://huggingface.co/spaces/chenxie95/EmoVoice) [![Hugging Face Dataset](https://img.shields.io/badge/Dataset-HuggingFace-green?logo=huggingface)](https://huggingface.co/datasets/yhaha/EmoVoice-DB) [![Webpage](https://img.shields.io/badge/Website-DemoPage-pink?logo=googlechrome&logoColor=white)](https://yanghaha0908.github.io/EmoVoice/)

## Overview EmoVoice is a emotion-controllable TTS model that exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control. EmoVoice achieves SOTA performance on English EmoVoice-DB and Chinese Secap test sets. ### Checkpoints English model checkpoints of EmoVoice(0.5B), EmoVoice(1.5B) and EmoVoice-PP(0.5B) are uploaded. Qwen2.5-0.5B-phn, the Qwen2.5-0.5B tokenizer with a phoneme-extended vocabulary, is uploaded. ## Citation If our work is useful for you, please cite as: ``` @article{yang2025emovoice, title={EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting}, author={Yang, Guanrou and Yang, Chen and Chen, Qian and Ma, Ziyang and Chen, Wenxi and Wang, Wen and Wang, Tianrui and Yang, Yifan and Niu, Zhikang and Liu, Wenrui and others}, journal={arXiv preprint arXiv:2504.12867}, year={2025} } ```