Solution: Multi-Voice TTS with Transformers.js (Browser-Only)

#9
by masbudjj - opened
WS YB YT org

Multi-Voice TTS - Browser-Only Solution

Problem Solved:

  • Kokoro-82M needs backend server (not browser-compatible)
  • Transformers.js only supports limited models
  • Need multiple voices without server dependency

Solution:

24 unique voices using SpeechT5 + embedding transformations!

Implementation:

  • Base: SpeechT5 (Xenova/speecht5_tts)
  • Voice Profiles: 24 unique character embeddings
  • Transformations: Pitch, Energy, Spectral shaping
  • Customization: User sliders for pitch & energy
  • 100% Browser: No server/API needed!

Voice Categories:

  1. American Female (6 voices)
  2. American Male (6 voices)
  3. British Female (4 voices)
  4. British Male (4 voices)
  5. International (4 voices)

Features:

  • 24 base voices
  • Pitch control (0.5x - 1.5x)
  • Energy control (0.5x - 1.5x)
  • Speed control (0.5x - 2.0x)
  • Infinite voice combinations!

Technology:

  • Transformers.js 3.1.2
  • ONNX Runtime (WASM)
  • Speaker embedding transformation
  • Real-time voice customization

Benefits:

  • Works 100% in browser
  • No server costs
  • Fast generation (2-5s)
  • Privacy-focused
  • Offline capable
masbudjj changed pull request status to merged

Sign up or log in to comment