FunAudioLLM
/

Fun-CosyVoice3-0.5B-2512

Text-to-Speech

ONNX

Safetensors

Model card Files Files and versions

xet

Community

supermustard commited on 20 days ago

Commit

1e20370

verified ·

1 Parent(s): 5f5b6a8

Update README.md

Browse files

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 ## 👉🏻 CosyVoice 👈🏻
-**CosyVoice 3.0**: [Demos](https://funaudiollm.github.io/cosyvoice3/); [Paper](https://arxiv.org/abs/2505.17589); [Modelscope](https://www.modelscope.cn/studios/FunAudioLLM/Fun-CosyVoice3-0.5B); [CV3-Eval](https://github.com/FunAudioLLM/CV3-Eval)
 **CosyVoice 2.0**: [Demos](https://funaudiollm.github.io/cosyvoice2/); [Paper](https://arxiv.org/abs/2412.10117); [Modelscope](https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B); [HuggingFace](https://huggingface.co/spaces/FunAudioLLM/CosyVoice2-0.5B)
@@ -10,7 +10,7 @@
 ## Highlight🔥
-**CosyVoice 3.0** is an advanced text-to-speech (TTS) system based on large language models (LLM), surpassing its predecessor (CosyVoice 2.0) in content consistency, speaker similarity, and prosody naturalness. It is designed for zero-shot multilingual speech synthesis in the wild.
 ### Key Features
 - **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
 - **Content Consistency & Naturalness**: Achieves state-of-the-art performance in content consistency, speaker similarity, and prosody naturalness.
@@ -24,7 +24,7 @@
 - [x] 2025/12
-    - [x] release Fun-CosyVoice3-0.5B base model and its training/inference script
     - [x] release Fun-CosyVoice3-0.5B modelscope gradio space
 - [x] 2025/08
@@ -75,8 +75,8 @@
 | VoxPCM | 0.93 | 1.85 | 8.87 |
 | GLM-TTS | 1.03 | - | - |
 | GLM-TTS_RL | 0.89 | - | - |
-| CosyVoice3 | 1.21 |  2.24 | 6.71 |
-| CosyVoice3_RL | 0.81 | 1.68 | 5.44 |
 ## Install

 ## 👉🏻 CosyVoice 👈🏻
+**Fun-CosyVoice 3.0**: [Demos](https://funaudiollm.github.io/cosyvoice3/); [Paper](https://arxiv.org/abs/2505.17589); [Modelscope](https://www.modelscope.cn/studios/FunAudioLLM/Fun-CosyVoice3-0.5B); [CV3-Eval](https://github.com/FunAudioLLM/CV3-Eval)
 **CosyVoice 2.0**: [Demos](https://funaudiollm.github.io/cosyvoice2/); [Paper](https://arxiv.org/abs/2412.10117); [Modelscope](https://www.modelscope.cn/studios/iic/CosyVoice2-0.5B); [HuggingFace](https://huggingface.co/spaces/FunAudioLLM/CosyVoice2-0.5B)
 ## Highlight🔥
+**Fun-CosyVoice 3.0** is an advanced text-to-speech (TTS) system based on large language models (LLM), surpassing its predecessor (CosyVoice 2.0) in content consistency, speaker similarity, and prosody naturalness. It is designed for zero-shot multilingual speech synthesis in the wild.
 ### Key Features
 - **Language Coverage**: Covers 9 common languages (Chinese, English, Japanese, Korean, German, Spanish, French, Italian, Russian), 18+ Chinese dialects/accents and meanwhile supports both multi-lingual/cross-lingual zero-shot voice cloning.
 - **Content Consistency & Naturalness**: Achieves state-of-the-art performance in content consistency, speaker similarity, and prosody naturalness.
 - [x] 2025/12
+    - [x] release Fun-CosyVoice3-0.5B-2512 base model and its training/inference script
     - [x] release Fun-CosyVoice3-0.5B modelscope gradio space
 - [x] 2025/08
 | VoxPCM | 0.93 | 1.85 | 8.87 |
 | GLM-TTS | 1.03 | - | - |
 | GLM-TTS_RL | 0.89 | - | - |
+| Fun-CosyVoice3-0.5B-2512 | 1.21 |  2.24 | 6.71 |
+| Fun-CosyVoice3-0.5B-2512_RL | 0.81 | 1.68 | 5.44 |
 ## Install