Improve model card: add pipeline tag, paper/project links, usage, and correct license

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: text-to-speech
4
+ tags:
5
+ - tts
6
+ - speech-synthesis
7
+ - emotion-control
8
+ ---
9
+
10
+ # EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting
11
+
12
+ EmoVoice is a novel emotion-controllable Text-to-Speech (TTS) model that exploits large language models (LLMs) to enable fine-grained freestyle natural language emotion control, and a phoneme boost variant design to enhance content consistency.
13
+
14
+ This model was presented in the paper: [EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting](https://huggingface.co/papers/2504.12867).
15
+
16
+ For more details, check out the [project page](https://yanghaha0908.github.io/EmoVoice/) and the [GitHub repository](https://github.com/yanghaha0908/EmoVoice).
17
+
18
+ ## Installation
19
+
20
+ ### Create a separate environment if needed
21
+
22
+ ```bash
23
+ conda create -n EmoVoice python=3.10
24
+ conda activate EmoVoice
25
+ pip install -r requirements.txt
26
+ ```
27
+ ## Usage
28
+
29
+ ### Decode with checkpoints
30
+ ```bash
31
+ bash examples/tts/scripts/inference_EmoVoice.sh
32
+ bash examples/tts/scripts/inference_EmoVoice-PP.sh
33
+ bash examples/tts/scripts/inference_EmoVoice_1.5B.sh
34
+ ```
35
+ ## Train from scratch
36
+ ```bash
37
+ # Fisrt Stage: Pretrain TTS
38
+ bash examples/tts/scripts/pretrain_EmoVoice.sh
39
+ bash examples/tts/scripts/pretrain_EmoVoice-PP.sh
40
+ bash examples/tts/scripts/pretrain_EmoVoice_1.5B.sh
41
+
42
+ # Second Stage: Finetune Emotional TTS
43
+ bash examples/tts/scripts/ft_EmoVoice.sh
44
+ bash examples/tts/scripts/ft_EmoVoice-PP.sh
45
+ bash examples/tts/scripts/ft_EmoVoice_1.5B.sh
46
+ ```
47
+
48
+ ## Checkpoints
49
+ - Checkpoints can be found on Hugging Face: https://huggingface.co/yhaha/EmoVoice
50
+
51
+ ## Dataset
52
+
53
+ - Pretrain TTS: [VoiceAssistant](https://huggingface.co/datasets/worstchan/VoiceAssistant-400K-SLAM-Omni)
54
+ - Finetune Emotional TTS: [EmoVoice-DB](https://huggingface.co/datasets/yhaha/EmoVoice-DB) and part of [laions_got_talent](https://huggingface.co/datasets/laion/laions_got_talent)
55
+
56
+ ## Acknowledgements
57
+ - Our codes is built on [SLAM-LLM](https://github.com/X-LANCE/SLAM-LLM)
58
+ - [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) valuable repo
59
+
60
+ ## Citation
61
+
62
+ If our work and codebase is useful for you, please cite as:
63
+ ```
64
+ @article{yang2025emovoice,
65
+ title={EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting},
66
+ author={Yang, Guanrou and Yang, Chen and Chen, Qian and Ma, Ziyang and Chen, Wenxi and Wang, Wen and Wang, Tianrui and Yang, Yifan and Niu, Zhikang and Liu, Wenrui and others},
67
+ journal={arXiv preprint arXiv:2504.12867},
68
+ year={2025}
69
+ }
70
+ ```