Update README.md
Browse files
README.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
| 2 |
|
| 3 |
## 👉🏻 CosyVoice 👈🏻
|
| 4 |
|
| 5 |
-
**CosyVoice 3.0**: [Demos](https://funaudiollm.github.io/cosyvoice3/); [Paper](https://arxiv.org/abs/2505.17589); [Modelscope](https://www.modelscope.cn/studios/FunAudioLLM/Fun-CosyVoice3-0.5B); [CV3-Eval](https://github.com/FunAudioLLM/CV3-Eval)
|
| 6 |
|
| 7 |
**CosyVoice 2.0**: [Demos](https://funaudiollm.github.io/cosyvoice2/); [Paper](https://arxiv.org/abs/2412.10117); [Modelscope](https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B); [HuggingFace](https://huggingface.co/spaces/FunAudioLLM/CosyVoice2-0.5B)
|
| 8 |
|
|
@@ -10,7 +10,7 @@
|
|
| 10 |
|
| 11 |
## Highlight🔥
|
| 12 |
|
| 13 |
-
**CosyVoice 3.0** is an advanced text-to-speech (TTS) system based on large language models (LLM), surpassing its predecessor (CosyVoice 2.0) in content consistency, speaker similarity, and prosody naturalness. It is designed for zero-shot multilingual speech synthesis in the wild.
|
| 14 |
### Key Features
|
| 15 |
- **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
|
| 16 |
- **Content Consistency & Naturalness**: Achieves state-of-the-art performance in content consistency, speaker similarity, and prosody naturalness.
|
|
@@ -24,7 +24,7 @@
|
|
| 24 |
|
| 25 |
- [x] 2025/12
|
| 26 |
|
| 27 |
-
- [x] release Fun-CosyVoice3-0.5B base model and its training/inference script
|
| 28 |
- [x] release Fun-CosyVoice3-0.5B modelscope gradio space
|
| 29 |
|
| 30 |
- [x] 2025/08
|
|
@@ -75,8 +75,8 @@
|
|
| 75 |
| VoxPCM | 0.93 | 1.85 | 8.87 |
|
| 76 |
| GLM-TTS | 1.03 | - | - |
|
| 77 |
| GLM-TTS_RL | 0.89 | - | - |
|
| 78 |
-
| CosyVoice3 | 1.21 | 2.24 | 6.71 |
|
| 79 |
-
|
|
| 80 |
|
| 81 |
|
| 82 |
## Install
|
|
|
|
| 2 |
|
| 3 |
## 👉🏻 CosyVoice 👈🏻
|
| 4 |
|
| 5 |
+
**Fun-CosyVoice 3.0**: [Demos](https://funaudiollm.github.io/cosyvoice3/); [Paper](https://arxiv.org/abs/2505.17589); [Modelscope](https://www.modelscope.cn/studios/FunAudioLLM/Fun-CosyVoice3-0.5B); [CV3-Eval](https://github.com/FunAudioLLM/CV3-Eval)
|
| 6 |
|
| 7 |
**CosyVoice 2.0**: [Demos](https://funaudiollm.github.io/cosyvoice2/); [Paper](https://arxiv.org/abs/2412.10117); [Modelscope](https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B); [HuggingFace](https://huggingface.co/spaces/FunAudioLLM/CosyVoice2-0.5B)
|
| 8 |
|
|
|
|
| 10 |
|
| 11 |
## Highlight🔥
|
| 12 |
|
| 13 |
+
**Fun-CosyVoice 3.0** is an advanced text-to-speech (TTS) system based on large language models (LLM), surpassing its predecessor (CosyVoice 2.0) in content consistency, speaker similarity, and prosody naturalness. It is designed for zero-shot multilingual speech synthesis in the wild.
|
| 14 |
### Key Features
|
| 15 |
- **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
|
| 16 |
- **Content Consistency & Naturalness**: Achieves state-of-the-art performance in content consistency, speaker similarity, and prosody naturalness.
|
|
|
|
| 24 |
|
| 25 |
- [x] 2025/12
|
| 26 |
|
| 27 |
+
- [x] release Fun-CosyVoice3-0.5B-2512 base model and its training/inference script
|
| 28 |
- [x] release Fun-CosyVoice3-0.5B modelscope gradio space
|
| 29 |
|
| 30 |
- [x] 2025/08
|
|
|
|
| 75 |
| VoxPCM | 0.93 | 1.85 | 8.87 |
|
| 76 |
| GLM-TTS | 1.03 | - | - |
|
| 77 |
| GLM-TTS_RL | 0.89 | - | - |
|
| 78 |
+
| Fun-CosyVoice3-0.5B-2512 | 1.21 | 2.24 | 6.71 |
|
| 79 |
+
| Fun-CosyVoice3-0.5B-2512_RL | 0.81 | 1.68 | 5.44 |
|
| 80 |
|
| 81 |
|
| 82 |
## Install
|