Text-to-Speech
ONNX
Safetensors
supermustard commited on
Commit
1e20370
·
verified ·
1 Parent(s): 5f5b6a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  ## 👉🏻 CosyVoice 👈🏻
4
 
5
- **CosyVoice 3.0**: [Demos](https://funaudiollm.github.io/cosyvoice3/); [Paper](https://arxiv.org/abs/2505.17589); [Modelscope](https://www.modelscope.cn/studios/FunAudioLLM/Fun-CosyVoice3-0.5B); [CV3-Eval](https://github.com/FunAudioLLM/CV3-Eval)
6
 
7
  **CosyVoice 2.0**: [Demos](https://funaudiollm.github.io/cosyvoice2/); [Paper](https://arxiv.org/abs/2412.10117); [Modelscope](https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B); [HuggingFace](https://huggingface.co/spaces/FunAudioLLM/CosyVoice2-0.5B)
8
 
@@ -10,7 +10,7 @@
10
 
11
  ## Highlight🔥
12
 
13
- **CosyVoice 3.0** is an advanced text-to-speech (TTS) system based on large language models (LLM), surpassing its predecessor (CosyVoice 2.0) in content consistency, speaker similarity, and prosody naturalness. It is designed for zero-shot multilingual speech synthesis in the wild.
14
  ### Key Features
15
  - **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
16
  - **Content Consistency & Naturalness**: Achieves state-of-the-art performance in content consistency, speaker similarity, and prosody naturalness.
@@ -24,7 +24,7 @@
24
 
25
  - [x] 2025/12
26
 
27
- - [x] release Fun-CosyVoice3-0.5B base model and its training/inference script
28
  - [x] release Fun-CosyVoice3-0.5B modelscope gradio space
29
 
30
  - [x] 2025/08
@@ -75,8 +75,8 @@
75
  | VoxPCM | 0.93 | 1.85 | 8.87 |
76
  | GLM-TTS | 1.03 | - | - |
77
  | GLM-TTS_RL | 0.89 | - | - |
78
- | CosyVoice3 | 1.21 | 2.24 | 6.71 |
79
- | CosyVoice3_RL | 0.81 | 1.68 | 5.44 |
80
 
81
 
82
  ## Install
 
2
 
3
  ## 👉🏻 CosyVoice 👈🏻
4
 
5
+ **Fun-CosyVoice 3.0**: [Demos](https://funaudiollm.github.io/cosyvoice3/); [Paper](https://arxiv.org/abs/2505.17589); [Modelscope](https://www.modelscope.cn/studios/FunAudioLLM/Fun-CosyVoice3-0.5B); [CV3-Eval](https://github.com/FunAudioLLM/CV3-Eval)
6
 
7
  **CosyVoice 2.0**: [Demos](https://funaudiollm.github.io/cosyvoice2/); [Paper](https://arxiv.org/abs/2412.10117); [Modelscope](https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B); [HuggingFace](https://huggingface.co/spaces/FunAudioLLM/CosyVoice2-0.5B)
8
 
 
10
 
11
  ## Highlight🔥
12
 
13
+ **Fun-CosyVoice 3.0** is an advanced text-to-speech (TTS) system based on large language models (LLM), surpassing its predecessor (CosyVoice 2.0) in content consistency, speaker similarity, and prosody naturalness. It is designed for zero-shot multilingual speech synthesis in the wild.
14
  ### Key Features
15
  - **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
16
  - **Content Consistency & Naturalness**: Achieves state-of-the-art performance in content consistency, speaker similarity, and prosody naturalness.
 
24
 
25
  - [x] 2025/12
26
 
27
+ - [x] release Fun-CosyVoice3-0.5B-2512 base model and its training/inference script
28
  - [x] release Fun-CosyVoice3-0.5B modelscope gradio space
29
 
30
  - [x] 2025/08
 
75
  | VoxPCM | 0.93 | 1.85 | 8.87 |
76
  | GLM-TTS | 1.03 | - | - |
77
  | GLM-TTS_RL | 0.89 | - | - |
78
+ | Fun-CosyVoice3-0.5B-2512 | 1.21 | 2.24 | 6.71 |
79
+ | Fun-CosyVoice3-0.5B-2512_RL | 0.81 | 1.68 | 5.44 |
80
 
81
 
82
  ## Install