gooorillax commited on
Commit
fbe77ca
·
1 Parent(s): 6aff4ff
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -25,10 +25,11 @@ V1.0
25
  This release of DSTK includes three modules:
26
  1. Semantic Tokenzier
27
  - Encode the semantic information of speech into discrete speech tokens.
28
- - frame rate: 25Hz; codebook size: 4096,supports both Chinese and English
 
29
  2. Semantic Detokenizer
30
  - Decode the discrete speech tokens into audible speech waveforms to reconstruct the speech
31
- - Supports both Chinese and English
32
  3. Text2token (T2U)
33
  - Convert text content into speech tokens
34
 
@@ -40,7 +41,7 @@ As shown in the figure below, the 3 module could form a pipeline for TTS task.
40
  As shown in figure below, the tokenizer and detokenizer could also form a pipeline for speech reconstruction task.
41
  <p align="center"><img src="figs/reconstruction.jpg" width="1200"></p>
42
 
43
- These pipelines achieved top-tier performance on TTS and speech reconstruction on the seed-tts-eval dataset, with less parameters and less supervised data for training:
44
  <p align="center"><img src="figs/eval1.jpg" width="1200"></p>
45
  <p align="center"><img src="figs/eval2.jpg" width="1200"></p>
46
 
@@ -77,8 +78,6 @@ sh thirdparty/G2P/patch_for_deps.sh
77
 
78
  ## Usage:
79
  ### Pipelines
80
-
81
-
82
  ```python
83
  import sys
84
  import soundfile as sf
 
25
  This release of DSTK includes three modules:
26
  1. Semantic Tokenzier
27
  - Encode the semantic information of speech into discrete speech tokens.
28
+ - frame rate: 25Hz; codebook size: 4096;
29
+ - Support both Chinese and English
30
  2. Semantic Detokenizer
31
  - Decode the discrete speech tokens into audible speech waveforms to reconstruct the speech
32
+ - Support both Chinese and English
33
  3. Text2token (T2U)
34
  - Convert text content into speech tokens
35
 
 
41
  As shown in figure below, the tokenizer and detokenizer could also form a pipeline for speech reconstruction task.
42
  <p align="center"><img src="figs/reconstruction.jpg" width="1200"></p>
43
 
44
+ These pipelines achieved top-tier performance on TTS and speech reconstruction on the seed-tts-eval dataset, with less parameters and much less supervised data for training:
45
  <p align="center"><img src="figs/eval1.jpg" width="1200"></p>
46
  <p align="center"><img src="figs/eval2.jpg" width="1200"></p>
47
 
 
78
 
79
  ## Usage:
80
  ### Pipelines
 
 
81
  ```python
82
  import sys
83
  import soundfile as sf