YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Phoneme-Based TTS Tokenizer
This tokenizer is designed for phoneme-based Text-to-Speech models using the Misaki phoneme set.
Vocabulary Structure
- Total Size: 12,869
- Phonemes: 49 (Misaki set)
- Shared (US/UK): 41
- American-only: 4
- British-only: 4
- Audio Codes: 12,801 (
<|code_0|>to<|code_12800|>) - Special Tokens: 18
Phoneme Set (Misaki - 49 total)
Shared Phonemes (41)
Stress Marks (2):
ˈ- Primary stressˌ- Secondary stress
Consonants (24):
- Simple: b d f h j k l m n p s t v w z
- Special: ɡ ŋ ɹ ʃ ʒ ð θ
- Clusters: ʤ ʧ
Vowels (10):
- ə i u ɑ ɔ ɛ ɜ ɪ ʊ ʌ
Dipthongs (4):
- A (eɪ) I (aɪ) W (aʊ) Y (ɔɪ)
Custom (1):
- ᵊ - Small schwa
American Phonemes (4)
- æ O ᵻ ɾ
British Phonemes (4)
- a Q ɒ ː
Usage
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("zuhri025/phoneme-tts-tokenizer")
# Encode phonemes
phonemes = "hˈɛlˌoʊ wˈɜːld"
tokens = tokenizer.encode(phonemes)
# Decode
decoded = tokenizer.decode(tokens)
Special Tokens
<|start_of_speech|>- Begin speech sequence<|end_of_speech|>- End speech sequence<|start_of_phonemes|>- Begin phoneme sequence<|end_of_phonemes|>- End phoneme sequence<|speech|>- Transition to audio tokens<|accent_us|>/<|accent_uk|>- Accent conditioning<|male|>/<|female|>- Gender conditioning
Model Compatibility
This tokenizer is designed for:
- Phoneme-only TTS training
- Models that use Misaki phoneme set
- Audio codec models with 12,801 codes
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support