ybtts / README.md
masbudjj's picture
Solution: Multi-Voice TTS with Transformers.js (Browser-Only)
1426699 verified
|
raw
history blame
4.3 kB
metadata
title: Multi-Voice TTS - 24 Unique Voices
emoji: πŸŽ™οΈ
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
license: apache-2.0

πŸŽ™οΈ Multi-Voice Text-to-Speech

24 Unique Voices - 100% Browser-Based - No Server Required

✨ Features

🎭 24 Unique Voice Characters

πŸ‡ΊπŸ‡Έ American Female (6 voices)

  • Default - Neutral baseline
  • Warm - Friendly & caring
  • Bright - Energetic & happy
  • Soft - Gentle & calm
  • Clear - Professional
  • Smooth - Elegant

πŸ‡ΊπŸ‡Έ American Male (6 voices)

  • Default - Neutral baseline
  • Deep - Authoritative
  • Friendly - Approachable
  • Strong - Confident
  • Calm - Relaxed
  • Professional - Business-oriented

πŸ‡¬πŸ‡§ British Female (4 voices)

  • Refined - Elegant
  • Bright - Cheerful
  • Soft - Gentle
  • Clear - Articulate

πŸ‡¬πŸ‡§ British Male (4 voices)

  • Distinguished - Formal
  • Smooth - Sophisticated
  • Warm - Friendly
  • Strong - Commanding

🌏 International (4 voices)

  • Neutral - Standard
  • Soft - Gentle
  • Clear - Professional
  • Warm - Friendly

🎨 Voice Customization

Each voice can be further customized with:

  • Pitch Control (0.5x - 1.5x) - Adjust voice pitch
  • Energy Control (0.5x - 1.5x) - Modify speaking energy
  • Speed Control (0.5x - 2.0x) - Playback speed

Total Combinations: 24 voices Γ— unlimited pitch/energy variations = Infinite possibilities!


πŸ—οΈ Technology

Base Model

  • SpeechT5 from Microsoft
  • ONNX Runtime for browser execution
  • WebAssembly backend

Voice Generation

Each of the 24 voices is created by:

  1. Taking base speaker embedding (512-dim)
  2. Applying pitch transformation
  3. Modulating energy levels
  4. Spectral shaping for character
  5. Prosody adjustment
  6. Normalization

πŸš€ Features

βœ… 24 Unique Voices - Diverse characters βœ… 100% Browser-Based - No server needed βœ… Voice Customization - Pitch & energy controls βœ… Fast Generation - 2-5 seconds βœ… High Quality - SpeechT5 architecture βœ… Offline Capable - Works after first load βœ… Privacy Focused - No data sent to servers βœ… Free & Open Source - Apache 2.0 license


πŸ’» How It Works

Voice Profile System

const VOICE_PROFILES = {
  af_warm: {
    pitch: 0.95,    // Slightly lower
    energy: 1.1,    // More energetic
    spectral: 0.2   // Brighter tone
  },
  am_deep: {
    pitch: 0.7,     // Much lower
    energy: 1.1,    // Strong
    spectral: -0.5  // Darker tone
  },
  // ... 24 total profiles
};

Generation Process

User Input Text
     ↓
Select Voice Profile
     ↓
Load Base Speaker Embedding
     ↓
Apply Transformations:
  - Pitch modification
  - Energy modulation
  - Spectral shaping
  - User adjustments (pitch/energy sliders)
     ↓
Normalize Embedding
     ↓
SpeechT5 Generation
     ↓
WAV Output

🎯 Use Cases

Professional/Corporate:

  • af_clear, am_professional, bf_clear, bm_distinguished

Friendly/Casual:

  • af_warm, am_friendly, bf_bright, int_warm

Storytelling/Narration:

  • af_smooth, am_calm, bf_refined, bm_smooth

Energetic/Marketing:

  • af_bright, am_strong, bf_bright

πŸ“Š Comparison

Feature This App SpeechT5 Basic Kokoro-82M
Voices 24 1 54
Browser βœ… Yes βœ… Yes ❌ No
Customization βœ… Pitch/Energy ❌ Limited βœ… Yes
Server ❌ Not needed ❌ Not needed βœ… Required
Speed ⚑ Fast ⚑ Fast ⏱️ Medium

πŸ”§ Technical Details

Model: Xenova/speecht5_tts Size: ~50MB (cached after first load) Format: ONNX (quantized) Sample Rate: 16kHz Output: WAV (16-bit PCM)

Voice Embedding: 512-dimensional vector Transformations: Pitch, energy, spectral Normalization: Z-score (mean=0, std=1)


πŸ“ License

Apache 2.0 - Free for personal and commercial use


πŸ™ Credits

  • Base Model: Microsoft SpeechT5
  • ONNX Conversion: Xenova/transformers.js
  • Voice Profiles: Custom implementation
  • UI: Modern glassmorphism design

Built with ❀️ using Transformers.js