Spaces:

WSYBYT
/

ybtts

Running

App Files Files Community

ybtts / README.md

masbudjj

Solution: Multi-Voice TTS with Transformers.js (Browser-Only)

1426699 verified 6 months ago

4.3 kB

title: Multi-Voice TTS - 24 Unique Voices
emoji: 🎙️
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
license: apache-2.0

🎙️ Multi-Voice Text-to-Speech

24 Unique Voices - 100% Browser-Based - No Server Required

✨ Features

🎭 24 Unique Voice Characters

🇺🇸 American Female (6 voices)

Default - Neutral baseline
Warm - Friendly & caring
Bright - Energetic & happy
Soft - Gentle & calm
Clear - Professional
Smooth - Elegant

🇺🇸 American Male (6 voices)

Default - Neutral baseline
Deep - Authoritative
Friendly - Approachable
Strong - Confident
Calm - Relaxed
Professional - Business-oriented

🇬🇧 British Female (4 voices)

Refined - Elegant
Bright - Cheerful
Soft - Gentle
Clear - Articulate

🇬🇧 British Male (4 voices)

Distinguished - Formal
Smooth - Sophisticated
Warm - Friendly
Strong - Commanding

🌏 International (4 voices)

Neutral - Standard
Soft - Gentle
Clear - Professional
Warm - Friendly

🎨 Voice Customization

Each voice can be further customized with:

Pitch Control (0.5x - 1.5x) - Adjust voice pitch
Energy Control (0.5x - 1.5x) - Modify speaking energy
Speed Control (0.5x - 2.0x) - Playback speed

Total Combinations: 24 voices × unlimited pitch/energy variations = Infinite possibilities!

🏗️ Technology

Base Model

SpeechT5 from Microsoft
ONNX Runtime for browser execution
WebAssembly backend

Voice Generation

Each of the 24 voices is created by:

Taking base speaker embedding (512-dim)
Applying pitch transformation
Modulating energy levels
Spectral shaping for character
Prosody adjustment
Normalization

🚀 Features

✅ 24 Unique Voices - Diverse characters ✅ 100% Browser-Based - No server needed ✅ Voice Customization - Pitch & energy controls ✅ Fast Generation - 2-5 seconds ✅ High Quality - SpeechT5 architecture ✅ Offline Capable - Works after first load ✅ Privacy Focused - No data sent to servers ✅ Free & Open Source - Apache 2.0 license

💻 How It Works

Voice Profile System

const VOICE_PROFILES = {
  af_warm: {
    pitch: 0.95,    // Slightly lower
    energy: 1.1,    // More energetic
    spectral: 0.2   // Brighter tone
  },
  am_deep: {
    pitch: 0.7,     // Much lower
    energy: 1.1,    // Strong
    spectral: -0.5  // Darker tone
  },
  // ... 24 total profiles
};

Generation Process

User Input Text
     ↓
Select Voice Profile
     ↓
Load Base Speaker Embedding
     ↓
Apply Transformations:
  - Pitch modification
  - Energy modulation
  - Spectral shaping
  - User adjustments (pitch/energy sliders)
     ↓
Normalize Embedding
     ↓
SpeechT5 Generation
     ↓
WAV Output

🎯 Use Cases

Professional/Corporate:

af_clear, am_professional, bf_clear, bm_distinguished

Friendly/Casual:

af_warm, am_friendly, bf_bright, int_warm

Storytelling/Narration:

af_smooth, am_calm, bf_refined, bm_smooth

Energetic/Marketing:

af_bright, am_strong, bf_bright

📊 Comparison

Feature	This App	SpeechT5 Basic	Kokoro-82M
Voices	24	1	54
Browser	✅ Yes	✅ Yes	❌ No
Customization	✅ Pitch/Energy	❌ Limited	✅ Yes
Server	❌ Not needed	❌ Not needed	✅ Required
Speed	⚡ Fast	⚡ Fast	⏱️ Medium

🔧 Technical Details

Model: Xenova/speecht5_tts Size: ~50MB (cached after first load) Format: ONNX (quantized) Sample Rate: 16kHz Output: WAV (16-bit PCM)

Voice Embedding: 512-dimensional vector Transformations: Pitch, energy, spectral Normalization: Z-score (mean=0, std=1)

📝 License

Apache 2.0 - Free for personal and commercial use

🙏 Credits

Base Model: Microsoft SpeechT5
ONNX Conversion: Xenova/transformers.js
Voice Profiles: Custom implementation
UI: Modern glassmorphism design

Built with ❤️ using Transformers.js