Cocii commited on
Commit
4edc284
Β·
verified Β·
1 Parent(s): 1bf48fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -44,7 +44,7 @@ GLM-TTS is a high-quality text-to-speech (TTS) synthesis system based on large l
44
 
45
  By introducing a **Multi-Reward Reinforcement Learning** framework, GLM-TTS significantly improves the expressiveness of generated speech, achieving more natural emotional control compared to traditional TTS systems.
46
 
47
- ### ✨ Key Features
48
 
49
  * **Zero-shot Voice Cloning:** Clone any speaker's voice with just 3-10 seconds of prompt audio.
50
  * **RL-enhanced Emotion Control:** Utilizes a multi-reward reinforcement learning framework (GRPO) to optimize prosody and emotion.
@@ -53,7 +53,7 @@ By introducing a **Multi-Reward Reinforcement Learning** framework, GLM-TTS sign
53
  * **Streaming Inference:** Supports real-time audio generation suitable for interactive applications.
54
  * **Bilingual Support:** Optimized for Chinese and English mixed text.
55
 
56
- ## 🧠 System Architecture
57
 
58
  GLM-TTS follows a two-stage design:
59
 
@@ -67,7 +67,7 @@ GLM-TTS follows a two-stage design:
67
  ### Reinforcement Learning Alignment
68
  To tackle flat emotional expression, GLM-TTS uses a **Group Relative Policy Optimization (GRPO)** algorithm with multiple reward functions (Similarity, CER, Emotion, Laughter) to align the LLM's generation strategy.
69
 
70
- ## πŸ“Š Evaluation Results
71
 
72
  Evaluated on `seed-tts-eval`. **GLM-TTS_RL** achieves the lowest Character Error Rate (CER) while maintaining high speaker similarity.
73
 
@@ -79,7 +79,7 @@ Evaluated on `seed-tts-eval`. **GLM-TTS_RL** achieves the lowest Character Error
79
  | **GLM-TTS (Base)** | 1.03 | 76.1 | πŸ‘ Yes |
80
  | **GLM-TTS_RL (Ours)** | **0.89** | 76.4 | πŸ‘ Yes |
81
 
82
- ## πŸš€ Quick Start
83
 
84
  ### Installation
85
 
@@ -105,7 +105,7 @@ python glmtts_inference.py \
105
  bash glmtts_inference.sh
106
  ```
107
 
108
- ## πŸ™ Acknowledgments & Citation
109
 
110
  We thank the following open-source projects for their support:
111
 
 
44
 
45
  By introducing a **Multi-Reward Reinforcement Learning** framework, GLM-TTS significantly improves the expressiveness of generated speech, achieving more natural emotional control compared to traditional TTS systems.
46
 
47
+ ### Key Features
48
 
49
  * **Zero-shot Voice Cloning:** Clone any speaker's voice with just 3-10 seconds of prompt audio.
50
  * **RL-enhanced Emotion Control:** Utilizes a multi-reward reinforcement learning framework (GRPO) to optimize prosody and emotion.
 
53
  * **Streaming Inference:** Supports real-time audio generation suitable for interactive applications.
54
  * **Bilingual Support:** Optimized for Chinese and English mixed text.
55
 
56
+ ## System Architecture
57
 
58
  GLM-TTS follows a two-stage design:
59
 
 
67
  ### Reinforcement Learning Alignment
68
  To tackle flat emotional expression, GLM-TTS uses a **Group Relative Policy Optimization (GRPO)** algorithm with multiple reward functions (Similarity, CER, Emotion, Laughter) to align the LLM's generation strategy.
69
 
70
+ ## Evaluation Results
71
 
72
  Evaluated on `seed-tts-eval`. **GLM-TTS_RL** achieves the lowest Character Error Rate (CER) while maintaining high speaker similarity.
73
 
 
79
  | **GLM-TTS (Base)** | 1.03 | 76.1 | πŸ‘ Yes |
80
  | **GLM-TTS_RL (Ours)** | **0.89** | 76.4 | πŸ‘ Yes |
81
 
82
+ ## Quick Start
83
 
84
  ### Installation
85
 
 
105
  bash glmtts_inference.sh
106
  ```
107
 
108
+ ## Acknowledgments & Citation
109
 
110
  We thank the following open-source projects for their support:
111