Spaces:
Running
Running
metadata
title: Kokoro-82M TTS - 54 Premium Voices
emoji: ποΈ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
ποΈ Kokoro-82M Text-to-Speech
World-Class TTS with 54 Premium Voices
β¨ Features
π 54 Premium Voices
πΊπΈ American English (19 voices)
Female (11 voices):
- Heart - Warm & Friendly
- Bella - Elegant & Smooth
- Nicole - Professional
- Aoede - Cheerful
- Kore - Gentle
- Sarah - Clear
- Nova - Modern
- Sky - Light
- Alloy - Versatile
- Jessica - Natural
- River - Calm
Male (8 voices):
- Michael - Deep & Authoritative
- Fenrir - Strong
- Puck - Playful
- Echo - Resonant
- Eric - Professional
- Liam - Friendly
- Onyx - Rich
- Adam - Natural
π¬π§ British English (8 voices)
Female (4 voices):
- Emma - Refined
- Isabella - Elegant
- Alice - Clear
- Lily - Soft
Male (4 voices):
- George - Distinguished
- Fable - Storyteller
- Lewis - Smooth
- Daniel - Professional
ποΈ Model Architecture
Kokoro-82M based on StyleTTS 2:
- Parameters: 82 Million
- Decoder: ISTFTNet
- Training: Few hundred hours of permissive data
- License: Apache 2.0
- Paper: StyleTTS 2 (arxiv.org/abs/2306.07691)
π― Features
β 54 Unique Voices - American & British accents β Natural Prosody - Human-like intonation β Fast Generation - 2-5 seconds per sentence β Speed Control - 0.5x to 2x playback β High Quality - StyleTTS 2 architecture β Open Source - Apache 2.0 license
π» Technology Stack
- Backend: Gradio + Hugging Face Inference API
- Model: Kokoro-82M (hexgrad/Kokoro-82M)
- Architecture: StyleTTS 2 + ISTFTNet
- Deployment: Hugging Face Spaces
π Usage
- Choose Voice - Select from 54 premium voices
- Enter Text - Type or paste your content
- Adjust Speed - Control playback rate (0.5x - 2x)
- Generate - Click to synthesize speech
- Download - Save audio as WAV file
π Comparison with Other Models
| Feature | Kokoro-82M | SpeechT5 | VITS |
|---|---|---|---|
| Voices | 54 | 1 | Variable |
| Quality | Excellent | Good | Good |
| Speed | Fast | Medium | Fast |
| Accents | US/UK | Generic | Variable |
| License | Apache 2.0 | Apache 2.0 | MIT |
π Credits
- Model: hexgrad/Kokoro-82M
- Base Architecture: StyleTTS 2 by Li et al.
- Decoder: ISTFTNet
- Training: Ethical permissive-licensed data only
π License
Apache 2.0 - Free for commercial use
π Links
- π Model Card
- π StyleTTS 2 Paper
- π GitHub (ONNX)
Built with β€οΈ using Kokoro-82M & Gradio