Initial text-based TTS tokenizer
Browse files- README.md +2 -2
- tokenizer.json +0 -0
- vocab.json +0 -0
README.md
CHANGED
|
@@ -4,7 +4,7 @@ This tokenizer is designed for text-based Text-to-Speech models.
|
|
| 4 |
|
| 5 |
## Vocabulary Structure
|
| 6 |
|
| 7 |
-
- **Total Size**:
|
| 8 |
- **Text Tokens**: ~8,000 (BPE-trained)
|
| 9 |
- **Audio Codes**: 12,801 (`<|code_0|>` to `<|code_12800|>`)
|
| 10 |
- **Special Tokens**: 22
|
|
@@ -54,7 +54,7 @@ BPE subword units trained on English text:
|
|
| 54 |
|
| 55 |
- **Tokenizer Type**: BPE (Byte-Pair Encoding)
|
| 56 |
- **Text Vocab Size**: 8,000
|
| 57 |
-
- **Training Data**:
|
| 58 |
- **Min Frequency**: 2
|
| 59 |
|
| 60 |
## Model Compatibility
|
|
|
|
| 4 |
|
| 5 |
## Vocabulary Structure
|
| 6 |
|
| 7 |
+
- **Total Size**: 20,831
|
| 8 |
- **Text Tokens**: ~8,000 (BPE-trained)
|
| 9 |
- **Audio Codes**: 12,801 (`<|code_0|>` to `<|code_12800|>`)
|
| 10 |
- **Special Tokens**: 22
|
|
|
|
| 54 |
|
| 55 |
- **Tokenizer Type**: BPE (Byte-Pair Encoding)
|
| 56 |
- **Text Vocab Size**: 8,000
|
| 57 |
+
- **Training Data**: 100,000+ text samples
|
| 58 |
- **Min Frequency**: 2
|
| 59 |
|
| 60 |
## Model Compatibility
|
tokenizer.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
vocab.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|