YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Phoneme-Based TTS Tokenizer

This tokenizer is designed for phoneme-based Text-to-Speech models using the Misaki phoneme set.

Vocabulary Structure

  • Total Size: 12,869
  • Phonemes: 49 (Misaki set)
    • Shared (US/UK): 41
    • American-only: 4
    • British-only: 4
  • Audio Codes: 12,801 (<|code_0|> to <|code_12800|>)
  • Special Tokens: 18

Phoneme Set (Misaki - 49 total)

Shared Phonemes (41)

Stress Marks (2):

  • ˈ - Primary stress
  • ˌ - Secondary stress

Consonants (24):

  • Simple: b d f h j k l m n p s t v w z
  • Special: ɡ ŋ ɹ ʃ ʒ ð θ
  • Clusters: ʤ ʧ

Vowels (10):

  • ə i u ɑ ɔ ɛ ɜ ɪ ʊ ʌ

Dipthongs (4):

  • A (eɪ) I (aɪ) W (aʊ) Y (ɔɪ)

Custom (1):

  • ᵊ - Small schwa

American Phonemes (4)

  • æ O ᵻ ɾ

British Phonemes (4)

  • a Q ɒ ː

Usage

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("zuhri025/phoneme-tts-tokenizer")

# Encode phonemes
phonemes = "hˈɛlˌoʊ wˈɜːld"
tokens = tokenizer.encode(phonemes)

# Decode
decoded = tokenizer.decode(tokens)

Special Tokens

  • <|start_of_speech|> - Begin speech sequence
  • <|end_of_speech|> - End speech sequence
  • <|start_of_phonemes|> - Begin phoneme sequence
  • <|end_of_phonemes|> - End phoneme sequence
  • <|speech|> - Transition to audio tokens
  • <|accent_us|> / <|accent_uk|> - Accent conditioning
  • <|male|> / <|female|> - Gender conditioning

Model Compatibility

This tokenizer is designed for:

  • Phoneme-only TTS training
  • Models that use Misaki phoneme set
  • Audio codec models with 12,801 codes
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support