|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
<!--  --> |
|
|
|
|
|
|
|
|
## 👉🏻 WenetSpeech-Yue 👈🏻 |
|
|
**WenetSpeech-Yue**: [Demos](https://aslp-lab.github.io/WenetSpeech-Yue/); [Paper](https://arxiv.org/abs/2509.03959); [Github](https://github.com/ASLP-lab/WenetSpeech-Yue); [HuggingFace](https://huggingface.co/datasets/ASLP-lab/WenetSpeech-Yue) |
|
|
|
|
|
## Highlight🔥 |
|
|
|
|
|
**WenetSpeech-Yue TTS Models** have been released! |
|
|
This repository contains two versions of the TTS models: |
|
|
1. **ASLP-lab/Cosyvoice2-Yue**: The base model for Cantonese TTS. |
|
|
2. **ASLP-lab/Cosyvoice2-Yue-ZoengJyutGaai**: A fine-tuned, higher-quality version for more natural speech generation. |
|
|
|
|
|
## Roadmap |
|
|
|
|
|
- [x] 2025/9 |
|
|
|
|
|
- [x] 25hz WenetSpeech-Yue TTS models released |
|
|
|
|
|
|
|
|
## Install |
|
|
|
|
|
**Clone and install** |
|
|
|
|
|
- Clone the repo |
|
|
``` sh |
|
|
git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git |
|
|
# If you failed to clone submodule due to network failures, please run following command until success |
|
|
cd CosyVoice |
|
|
git submodule update --init --recursive |
|
|
``` |
|
|
|
|
|
- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html |
|
|
- Create Conda env: |
|
|
|
|
|
``` sh |
|
|
conda create -n cosyvoice python=3.10 |
|
|
conda activate cosyvoice |
|
|
# pynini is required by WeTextProcessing, use conda to install it as it can be executed on all platform. |
|
|
conda install -y -c conda-forge pynini==2.1.5 |
|
|
pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com |
|
|
|
|
|
# If you encounter sox compatibility issues |
|
|
# ubuntu |
|
|
sudo apt-get install sox libsox-dev |
|
|
# centos |
|
|
sudo yum install sox sox-devel |
|
|
``` |
|
|
|
|
|
**Model download** |
|
|
|
|
|
|
|
|
1. [Cosyvoice2-Yue](https://huggingface.co/ASLP-lab/Cosyvoice2-Yue) |
|
|
2. [Cosyvoice2-Yue-ZoengJyutGaai](https://huggingface.co/ASLP-lab/Cosyvoice2-Yue-ZoengJyutGaai) |
|
|
|
|
|
|
|
|
**Basic Usage** |
|
|
|
|
|
We strongly recommend using `CosyVoice2-0.5B` for better performance. |
|
|
Follow code below for detailed usage of each model. |
|
|
|
|
|
``` python |
|
|
import sys |
|
|
sys.path.append('third_party/Matcha-TTS') |
|
|
from cosyvoice.cli.cosyvoice import CosyVoice, CosyVoice2 |
|
|
from cosyvoice.utils.file_utils import load_wav |
|
|
import torchaudio |
|
|
``` |
|
|
|
|
|
**CosyVoice2 Usage** |
|
|
```python |
|
|
cosyvoice = CosyVoice2('ASLP-lab/Cosyvoice2-Yue', load_jit=False, load_trt=False, fp16=False) |
|
|
|
|
|
# NOTE if you want to reproduce the results on https://funaudiollm.github.io/cosyvoice2, please add text_frontend=False during inference |
|
|
# zero_shot usage |
|
|
prompt_speech_16k = load_wav('zero_shot_prompt.wav', 16000) |
|
|
|
|
|
# instruct usage |
|
|
for i, j in enumerate(cosyvoice.inference_instruct2('收到朋友从远方寄嚟嘅生日礼物,呢份意外嘅惊喜同埋满满嘅祝福令我内心充满咗甜蜜嘅快乐,个笑容就好似花咁咧盛开住。', '用粤语说这句话', prompt_speech_16k, stream=False)): |
|
|
torchaudio.save('instruct_{}.wav'.format(i), j['tts_speech'], cosyvoice.sample_rate) |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
If you are interested in leaving a message to our research team, feel free to email lhli@mail.nwpu.edu.cn or gzhao@mail.nwpu.edu.cn. |