Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Step-Audio-TTS-3B
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
Step-Audio-TTS-3B 是业界首个基于大规模合成数据和LLM-Chat范式训练的TTS模型,在SEED TTS Eval上取得SOTA的CER结果,支持多种语言,多种情感,多种语音风格控制,也是业界首个支持RAP和哼唱的TTS模型。
|
| 5 |
+
|
| 6 |
+
Step-Audio-TTS-3B represents the industry's first Text-to-Speech (TTS) model trained on a large-scale synthetic dataset utilizing the LLM-Chat paradigm. It has achieved SOTA Character Error Rate (CER) results on the SEED TTS Eval benchmark. The model supports multiple languages, a variety of emotional expressions, and diverse voice style controls. Notably, Step-Audio-TTS-3B is also the first TTS model in the industry capable of generating RAP and Humming, marking a significant advancement in the field of speech synthesis.
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
本仓库提供 StepAudio-TTS-3B的采样dual-codebook训练的LLM 模型权重,基于dual-codebook训练的vocoder,以及为哼唱专门训练的vocoder。
|
| 10 |
+
|
| 11 |
+
This repository provides the model weights for StepAudio-TTS-3B, which is a dual-codebook trained LLM (Large Language Model) for text-to-speech synthesis. Additionally, it includes a vocoder trained using the dual-codebook approach, as well as a specialized vocoder specifically optimized for humming generation. These resources collectively enable high-quality speech synthesis and humming capabilities, leveraging the advanced dual-codebook training methodology.
|
| 12 |
+
|
| 13 |
+
更多信息请参考我们的仓库: [Step-Audio](https://github.com/stepfun-ai/Step-Audio).
|
| 14 |
+
|
| 15 |
+
For more information, please refer to our repository: [Step-Audio](https://github.com/stepfun-ai/Step-Audio).
|