File size: 8,093 Bytes
f66b0a3 8a03d54 f66b0a3 f520907 f66b0a3 1ca9646 f66b0a3 39ad93d f66b0a3 39ad93d 10f3fbf 39ad93d 4ae14db f66b0a3 39ad93d f66b0a3 39ad93d 26bfbe8 39ad93d 44f9606 4ae14db 44f9606 f66b0a3 44f9606 f66b0a3 4ae14db 39ad93d b12f63e 26bfbe8 39ad93d 26bfbe8 b12f63e 26bfbe8 39ad93d e07e977 39ad93d 960e081 4ae14db 39ad93d 4ae14db 39ad93d 35bd210 4ae14db 7c96fdd 4ae14db 78419a3 4ae14db 44f9606 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
---
license: mit
language:
- en
- zh
base_model:
- CosyVoice2
pipeline_tag: text-to-speech
library_name: transformers
tags:
- CosyVoice2
- Speech
---
# CosyVoice2
This version of CosyVoice2 has been converted to run on the Axera NPU using **w8a16** quantization.
Compatible with Pulsar2 version: 4.2
## Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo :
[Cosyvoice](https://github.com/FunAudioLLM/CosyVoice)
[Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/Cosyvoice2.Axera)
## Support Platform
- AX650
- AX650N DEMO Board
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
**Speech Generation**
| Stage | Time |
|------|------|
| llm prefill ( input_token_num + prompt_token_num ε¨ [0,128 ] ) | 104 ms |
| llm prefill ( input_token_num + prompt_token_num ε¨ [128,256 ] ) | 234 ms |
| Decode | 21.24 token/s |
## How to use
Download all files from this repository to the device
### 1. PrePare
#### 1.1 Copy this project to AX650 Board
#### 1.2 Prepare Dependencies
**Running HTTP Tokenizer Server** and **Processing Prompt Speech** require these Python packages. If you run these two step on a PC, install them on the PC.
```
pip3 install -r scripts/requirements.txt
```
### 2. Start HTTP Tokenizer Server
```
cd scripts
python cosyvoice2_tokenizer.py --host {your host} --port {your port}
```
### 3. Run on Axera Device
There are 2 kinds of device, AX650 Board , AXCL aarch64 Board and AXCL x86 Board.
#### 3.1 Run on AX650 Board
1) Moidfy the HTTP host in `run_ax650.sh`.
2) Run `run_ax650.sh`
```shell
root@ax650 ~/Cosyvoice2 # bash run_ax650.sh
rm: cannot remove 'output*.wav': No such file or directory
[I][ Init][ 108]: LLM init start
[I][ Init][ 34]: connect http://10.122.86.184:12345 ok
bos_id: 0, eos_id: 1773
7% | βββ | 2 / 27 [3.11s<42.04s, 0.64 count/s] embed_selector init ok[I][ Init][ 138]: attr.axmodel_num:24
100% | ββββββββββββββββββββββββββββββββ | 27 / 27 [10.32s<10.32s, 2.62 count/s] init post axmodel ok,remain_cmm(7178 MB)
[I][ Init][ 216]: max_token_len : 1023
[I][ Init][ 221]: kv_cache_size : 128, kv_cache_num: 1023
[I][ Init][ 229]: prefill_token_num : 128
[I][ Init][ 233]: grp: 1, prefill_max_token_num : 1
[I][ Init][ 233]: grp: 2, prefill_max_token_num : 128
[I][ Init][ 233]: grp: 3, prefill_max_token_num : 256
[I][ Init][ 233]: grp: 4, prefill_max_token_num : 384
[I][ Init][ 233]: grp: 5, prefill_max_token_num : 512
[I][ Init][ 237]: prefill_max_token_num : 512
[I][ Init][ 249]: LLM init ok
[I][ Init][ 154]: Token2Wav init ok
[I][ main][ 273]:
[I][ Run][ 388]: input token num : 142, prefill_split_num : 2
[I][ Run][ 422]: input_num_token:128
[I][ Run][ 422]: input_num_token:14
[I][ Run][ 607]: ttft: 236.90 ms
[Main/Token2Wav Thread] Processing batch of 28 tokens...
Successfully saved audio to output_0.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 53 tokens...
Successfully saved audio to output_1.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_2.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_3.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_4.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_5.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_6.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_7.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_8.wav (32-bit Float PCM).
[Main/Token2Wav Thread] Processing batch of 78 tokens...
Successfully saved audio to output_9.wav (32-bit Float PCM).
[I][ Run][ 723]: hit eos, llm finished
[I][ Run][ 753]: llm finished
[Main/Token2Wav Thread] Buffer is empty and LLM finished. Exiting.
[I][ Run][ 758]: total decode tokens:271
[N][ Run][ 759]: hit eos,avg 21.47 token/s
Successfully saved audio to output_10.wav (32-bit Float PCM).
Successfully saved audio to output.wav (32-bit Float PCM).
Voice generation pipeline completed.
Type "q" to exit, Ctrl+c to stop current running
text >>
```
Output SpeechοΌ
[output.wav](asset/output.wav)
#### Or run on AX650 Board with Gradio GUI
1) Start server
```
bash run_api_ax650.sh
```
2) Start Gradio GUI
```
python scripts/gradio_demo.py
```
#### 3.2 Run on AXCL aarch64 Board
```
bash run_axcl_aarch64.sh
```
#### Or run on AXCL aarch64 Board with Gradio GUI
1) Start server
```
bash run_api_axcl_aarch64.sh
```
2) Start Gradio GUI
```
python scripts/gradio_demo.py
```
3) Open the page from a browser
The page url is : `https://{your device ip}:7860`
Note that you need to run these two commands in the project root directory.
#### 3.3 Run on AXCL x86 Board
```
bash run_axcl_x86.sh
```
#### Or run on AXCL aarch64 Board with Gradio GUI
1) Start server
```
bash run_api_axcl_x86.sh
```
2) Start Gradio GUI
```
python scripts/gradio_demo.py
```
3) Open the page from a browser
The page url is : `https://{your device ip}:7860`
Note that you need to run these two commands in the project root directory.

### Optional. Process Prompt Speech
If you want to replicate a specific sound, do this step.
You can use audio in asset/ .
#### (1). Downlaod wetext
```
pip3 install modelscope
modelscope download --model pengzhendong/wetext --local_dir pengzhendong/wetext
```
#### (2). Process Prompt Speech
Example:
```
python3 scripts/process_prompt.py --prompt_text asset/zh_man1.txt --prompt_speech asset/zh_man1.wav --output zh_man1
```
Pass parameters according to the actual situation.
```
python3 scripts/process_prompt.py -h
usage: process_prompt.py [-h] [--model_dir MODEL_DIR] [--wetext_dir WETEXT_DIR] [--sample_rate SAMPLE_RATE] [--prompt_text PROMPT_TEXT] [--prompt_speech PROMPT_SPEECH]
[--output OUTPUT]
options:
-h, --help show this help message and exit
--model_dir MODEL_DIR
tokenizer configuration directionary
--wetext_dir WETEXT_DIR
path to wetext
--sample_rate SAMPLE_RATE
Sampling rate for prompt audio
--prompt_text PROMPT_TEXT
The text content of the prompt(reference) audio. Text or file path.
--prompt_speech PROMPT_SPEECH
The path to prompt(reference) audio.
--output OUTPUT Output data storage directory
```
After executing the above command, files like the following will be generated:
```
flow_embedding.txt
flow_prompt_speech_token.txt
llm_embedding.txt
llm_prompt_speech_token.txt
prompt_speech_feat.txt
prompt_text.txt
```
When you run run_ax650.sh, pass the output path here to the prompt_files parameter of the run_ax650.sh script.
|