Commit
·
fbe77ca
1
Parent(s):
6aff4ff
fix typos
Browse files
README.md
CHANGED
|
@@ -25,10 +25,11 @@ V1.0
|
|
| 25 |
This release of DSTK includes three modules:
|
| 26 |
1. Semantic Tokenzier
|
| 27 |
- Encode the semantic information of speech into discrete speech tokens.
|
| 28 |
-
- frame rate: 25Hz; codebook size: 4096
|
|
|
|
| 29 |
2. Semantic Detokenizer
|
| 30 |
- Decode the discrete speech tokens into audible speech waveforms to reconstruct the speech
|
| 31 |
-
-
|
| 32 |
3. Text2token (T2U)
|
| 33 |
- Convert text content into speech tokens
|
| 34 |
|
|
@@ -40,7 +41,7 @@ As shown in the figure below, the 3 module could form a pipeline for TTS task.
|
|
| 40 |
As shown in figure below, the tokenizer and detokenizer could also form a pipeline for speech reconstruction task.
|
| 41 |
<p align="center"><img src="figs/reconstruction.jpg" width="1200"></p>
|
| 42 |
|
| 43 |
-
These pipelines achieved top-tier performance on TTS and speech reconstruction on the seed-tts-eval dataset, with less parameters and less supervised data for training:
|
| 44 |
<p align="center"><img src="figs/eval1.jpg" width="1200"></p>
|
| 45 |
<p align="center"><img src="figs/eval2.jpg" width="1200"></p>
|
| 46 |
|
|
@@ -77,8 +78,6 @@ sh thirdparty/G2P/patch_for_deps.sh
|
|
| 77 |
|
| 78 |
## Usage:
|
| 79 |
### Pipelines
|
| 80 |
-
|
| 81 |
-
|
| 82 |
```python
|
| 83 |
import sys
|
| 84 |
import soundfile as sf
|
|
|
|
| 25 |
This release of DSTK includes three modules:
|
| 26 |
1. Semantic Tokenzier
|
| 27 |
- Encode the semantic information of speech into discrete speech tokens.
|
| 28 |
+
- frame rate: 25Hz; codebook size: 4096;
|
| 29 |
+
- Support both Chinese and English
|
| 30 |
2. Semantic Detokenizer
|
| 31 |
- Decode the discrete speech tokens into audible speech waveforms to reconstruct the speech
|
| 32 |
+
- Support both Chinese and English
|
| 33 |
3. Text2token (T2U)
|
| 34 |
- Convert text content into speech tokens
|
| 35 |
|
|
|
|
| 41 |
As shown in figure below, the tokenizer and detokenizer could also form a pipeline for speech reconstruction task.
|
| 42 |
<p align="center"><img src="figs/reconstruction.jpg" width="1200"></p>
|
| 43 |
|
| 44 |
+
These pipelines achieved top-tier performance on TTS and speech reconstruction on the seed-tts-eval dataset, with less parameters and much less supervised data for training:
|
| 45 |
<p align="center"><img src="figs/eval1.jpg" width="1200"></p>
|
| 46 |
<p align="center"><img src="figs/eval2.jpg" width="1200"></p>
|
| 47 |
|
|
|
|
| 78 |
|
| 79 |
## Usage:
|
| 80 |
### Pipelines
|
|
|
|
|
|
|
| 81 |
```python
|
| 82 |
import sys
|
| 83 |
import soundfile as sf
|