Spaces:
Running
title: Multi-Voice TTS - 24 Unique Voices
emoji: ποΈ
colorFrom: indigo
colorTo: purple
sdk: static
pinned: false
license: apache-2.0
ποΈ Multi-Voice Text-to-Speech
24 Unique Voices - 100% Browser-Based - No Server Required
β¨ Features
π 24 Unique Voice Characters
πΊπΈ American Female (6 voices)
- Default - Neutral baseline
- Warm - Friendly & caring
- Bright - Energetic & happy
- Soft - Gentle & calm
- Clear - Professional
- Smooth - Elegant
πΊπΈ American Male (6 voices)
- Default - Neutral baseline
- Deep - Authoritative
- Friendly - Approachable
- Strong - Confident
- Calm - Relaxed
- Professional - Business-oriented
π¬π§ British Female (4 voices)
- Refined - Elegant
- Bright - Cheerful
- Soft - Gentle
- Clear - Articulate
π¬π§ British Male (4 voices)
- Distinguished - Formal
- Smooth - Sophisticated
- Warm - Friendly
- Strong - Commanding
π International (4 voices)
- Neutral - Standard
- Soft - Gentle
- Clear - Professional
- Warm - Friendly
π¨ Voice Customization
Each voice can be further customized with:
- Pitch Control (0.5x - 1.5x) - Adjust voice pitch
- Energy Control (0.5x - 1.5x) - Modify speaking energy
- Speed Control (0.5x - 2.0x) - Playback speed
Total Combinations: 24 voices Γ unlimited pitch/energy variations = Infinite possibilities!
ποΈ Technology
Base Model
- SpeechT5 from Microsoft
- ONNX Runtime for browser execution
- WebAssembly backend
Voice Generation
Each of the 24 voices is created by:
- Taking base speaker embedding (512-dim)
- Applying pitch transformation
- Modulating energy levels
- Spectral shaping for character
- Prosody adjustment
- Normalization
π Features
β 24 Unique Voices - Diverse characters β 100% Browser-Based - No server needed β Voice Customization - Pitch & energy controls β Fast Generation - 2-5 seconds β High Quality - SpeechT5 architecture β Offline Capable - Works after first load β Privacy Focused - No data sent to servers β Free & Open Source - Apache 2.0 license
π» How It Works
Voice Profile System
const VOICE_PROFILES = {
af_warm: {
pitch: 0.95, // Slightly lower
energy: 1.1, // More energetic
spectral: 0.2 // Brighter tone
},
am_deep: {
pitch: 0.7, // Much lower
energy: 1.1, // Strong
spectral: -0.5 // Darker tone
},
// ... 24 total profiles
};
Generation Process
User Input Text
β
Select Voice Profile
β
Load Base Speaker Embedding
β
Apply Transformations:
- Pitch modification
- Energy modulation
- Spectral shaping
- User adjustments (pitch/energy sliders)
β
Normalize Embedding
β
SpeechT5 Generation
β
WAV Output
π― Use Cases
Professional/Corporate:
- af_clear, am_professional, bf_clear, bm_distinguished
Friendly/Casual:
- af_warm, am_friendly, bf_bright, int_warm
Storytelling/Narration:
- af_smooth, am_calm, bf_refined, bm_smooth
Energetic/Marketing:
- af_bright, am_strong, bf_bright
π Comparison
| Feature | This App | SpeechT5 Basic | Kokoro-82M |
|---|---|---|---|
| Voices | 24 | 1 | 54 |
| Browser | β Yes | β Yes | β No |
| Customization | β Pitch/Energy | β Limited | β Yes |
| Server | β Not needed | β Not needed | β Required |
| Speed | β‘ Fast | β‘ Fast | β±οΈ Medium |
π§ Technical Details
Model: Xenova/speecht5_tts Size: ~50MB (cached after first load) Format: ONNX (quantized) Sample Rate: 16kHz Output: WAV (16-bit PCM)
Voice Embedding: 512-dimensional vector Transformations: Pitch, energy, spectral Normalization: Z-score (mean=0, std=1)
π License
Apache 2.0 - Free for personal and commercial use
π Credits
- Base Model: Microsoft SpeechT5
- ONNX Conversion: Xenova/transformers.js
- Voice Profiles: Custom implementation
- UI: Modern glassmorphism design
Built with β€οΈ using Transformers.js