Cocii
/

glmtts-test

@@ -44,7 +44,7 @@ GLM-TTS is a high-quality text-to-speech (TTS) synthesis system based on large l
 By introducing a **Multi-Reward Reinforcement Learning** framework, GLM-TTS significantly improves the expressiveness of generated speech, achieving more natural emotional control compared to traditional TTS systems.
-### ✨ Key Features
 * **Zero-shot Voice Cloning:** Clone any speaker's voice with just 3-10 seconds of prompt audio.
 * **RL-enhanced Emotion Control:** Utilizes a multi-reward reinforcement learning framework (GRPO) to optimize prosody and emotion.
@@ -53,7 +53,7 @@ By introducing a **Multi-Reward Reinforcement Learning** framework, GLM-TTS sign
 * **Streaming Inference:** Supports real-time audio generation suitable for interactive applications.
 * **Bilingual Support:** Optimized for Chinese and English mixed text.
-## 🧠 System Architecture
 GLM-TTS follows a two-stage design:
@@ -67,7 +67,7 @@ GLM-TTS follows a two-stage design:
 ### Reinforcement Learning Alignment
 To tackle flat emotional expression, GLM-TTS uses a **Group Relative Policy Optimization (GRPO)** algorithm with multiple reward functions (Similarity, CER, Emotion, Laughter) to align the LLM's generation strategy.
-## 📊 Evaluation Results
 Evaluated on `seed-tts-eval`. **GLM-TTS_RL** achieves the lowest Character Error Rate (CER) while maintaining high speaker similarity.
@@ -79,7 +79,7 @@ Evaluated on `seed-tts-eval`. **GLM-TTS_RL** achieves the lowest Character Error
 | **GLM-TTS (Base)** | 1.03 | 76.1 | 👐 Yes |
 | **GLM-TTS_RL (Ours)** | **0.89** | 76.4 | 👐 Yes |
-## 🚀 Quick Start
 ### Installation
@@ -105,7 +105,7 @@ python glmtts_inference.py \
 bash glmtts_inference.sh
 ```
-## 🙏 Acknowledgments & Citation
 We thank the following open-source projects for their support:

 By introducing a **Multi-Reward Reinforcement Learning** framework, GLM-TTS significantly improves the expressiveness of generated speech, achieving more natural emotional control compared to traditional TTS systems.
+### Key Features
 * **Zero-shot Voice Cloning:** Clone any speaker's voice with just 3-10 seconds of prompt audio.
 * **RL-enhanced Emotion Control:** Utilizes a multi-reward reinforcement learning framework (GRPO) to optimize prosody and emotion.
 * **Streaming Inference:** Supports real-time audio generation suitable for interactive applications.
 * **Bilingual Support:** Optimized for Chinese and English mixed text.
+## System Architecture
 GLM-TTS follows a two-stage design:
 ### Reinforcement Learning Alignment
 To tackle flat emotional expression, GLM-TTS uses a **Group Relative Policy Optimization (GRPO)** algorithm with multiple reward functions (Similarity, CER, Emotion, Laughter) to align the LLM's generation strategy.
+## Evaluation Results
 Evaluated on `seed-tts-eval`. **GLM-TTS_RL** achieves the lowest Character Error Rate (CER) while maintaining high speaker similarity.
 | **GLM-TTS (Base)** | 1.03 | 76.1 | 👐 Yes |
 | **GLM-TTS_RL (Ours)** | **0.89** | 76.4 | 👐 Yes |
+## Quick Start
 ### Installation
 bash glmtts_inference.sh
 ```
+## Acknowledgments & Citation
 We thank the following open-source projects for their support: