Spaces:

WSYBYT
/

ybtts

Running

App Files Files Community

Major Update: Kokoro-82M with 54 Premium Voices

by masbudjj - opened Oct 22, 2025

base: refs/heads/main

←

from: refs/pr/6

Discussion Files changed

+113

-56

Files changed (1) hide show

README.md +113 -56

README.md CHANGED Viewed

@@ -1,79 +1,136 @@
 ---
-title: TTS Browser Demo - Transformers.js
 emoji: 🎙️
-colorFrom: blue
-colorTo: indigo
-sdk: static
 pinned: false
 ---
-# 🎙️ Text-to-Speech Browser Demo
-Demo **TTS (Text-to-Speech)** yang berjalan **100% di browser** menggunakan **Transformers.js** dari Hugging Face.
-Tidak perlu server Python, tidak ada biaya hosting!
-## ✨ Fitur Lengkap
-### 🎙️ Model TTS (3 Pilihan)
-- **SpeechT5** (Fast) - Model cepat untuk testing (`Xenova/speecht5_tts`)
-- **SpeechT5 VCTK HiFi** (Best Quality) - Kualitas audio tertinggi (`Xenova/speecht5_tts_vctk_hifi`)
-- **MMS English** (Meta) - Model multilingual Meta (`Xenova/mms-tts-eng`)
-### 🎚️ Voice Controls (Semua Berfungsi!)
-- **Speed Control** (0.5x - 2x) - Real-time playback speed adjustment
-- **Temperature** (0.1 - 1.5) - Kontrol kreativitas output
-- **Top P Sampling** (0.01 - 1.0) - Nucleus sampling untuk variasi natural
-- **Top K** (0-50) - Token selection control
-- **Repetition Penalty** (0.8 - 2.0) - Hindari pengulangan kata
-- **Length Penalty** (0.1 - 2.0) - Kontrol panjang audio
-- **Num Beams** (1-8) - Beam search untuk kualitas lebih baik
-### 🎤 Speaker Voice Cloning
-- Upload audio file untuk clone karakteristik suara
-- Support semua format audio (MP3, WAV, M4A, dll)
-- Processing otomatis speaker embeddings
-### 💻 Teknologi
-- ⚡ **100% Client-Side** - Zero server dependency
-- 🚀 **WebGPU Acceleration** - Auto-detect & fallback ke WASM
-- 💾 **Smart Caching** - Model di-cache setelah download pertama
-- 📊 **Real-time Logging** - Activity log dengan timestamp
-- 🎨 **Modern UI** - Dark theme, glassmorphism, smooth animations
-- 📱 **Fully Responsive** - Works on mobile, tablet, desktop
-## 📖 Cara Pakai
-1. **Duplicate Space** ini atau clone repository
-2. Buka URL Space, tunggu model loading (pertama kali akan download ONNX weights)
-3. **Pilih Model** dari dropdown di panel kanan
-4. Ketik teks yang ingin diubah jadi suara
-5. Klik **Generate**
-6. Audio akan muncul dengan tombol **Download**
-## 🛠️ Teknologi
-- [Transformers.js](https://huggingface.co/docs/transformers.js) v3.x
-- Vanilla JavaScript (ES6 Modules)
-- ONNX Runtime (WASM/WebGPU)
-## 📝 Catatan
-- Beberapa kontrol UI (emotion vector, speaker prompt) adalah placeholder untuk ekspansi fitur di masa depan
-- Model akan di-cache di browser setelah download pertama
-- Gunakan browser modern (Chrome, Edge, Firefox) untuk performa optimal
-## 🚀 Deploy Sendiri
-```bash
-# Clone repository
-git clone <your-repo-url>
-# Deploy ke Hugging Face Spaces
-# 1. Buat Space baru di huggingface.co/spaces
-# 2. Pilih "Static" sebagai SDK
-# 3. Upload semua file atau connect Git repository
-```
 ---
-**Template ini siap untuk production!** 🎉

 ---
+title: Kokoro-82M TTS - 54 Premium Voices
 emoji: 🎙️
+colorFrom: indigo
+colorTo: purple
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
 pinned: false
+license: apache-2.0
 ---
+# 🎙️ Kokoro-82M Text-to-Speech
+**World-Class TTS with 54 Premium Voices**
+## ✨ Features
+### 🎭 54 Premium Voices
+#### 🇺🇸 American English (19 voices)
+**Female (11 voices):**
+- Heart - Warm & Friendly
+- Bella - Elegant & Smooth
+- Nicole - Professional
+- Aoede - Cheerful
+- Kore - Gentle
+- Sarah - Clear
+- Nova - Modern
+- Sky - Light
+- Alloy - Versatile
+- Jessica - Natural
+- River - Calm
+**Male (8 voices):**
+- Michael - Deep & Authoritative
+- Fenrir - Strong
+- Puck - Playful
+- Echo - Resonant
+- Eric - Professional
+- Liam - Friendly
+- Onyx - Rich
+- Adam - Natural
+#### 🇬🇧 British English (8 voices)
+**Female (4 voices):**
+- Emma - Refined
+- Isabella - Elegant
+- Alice - Clear
+- Lily - Soft
+**Male (4 voices):**
+- George - Distinguished
+- Fable - Storyteller
+- Lewis - Smooth
+- Daniel - Professional
+---
+## 🏗️ Model Architecture
+**Kokoro-82M** based on **StyleTTS 2**:
+- **Parameters**: 82 Million
+- **Decoder**: ISTFTNet
+- **Training**: Few hundred hours of permissive data
+- **License**: Apache 2.0
+- **Paper**: [StyleTTS 2 (arxiv.org/abs/2306.07691)](https://arxiv.org/abs/2306.07691)
+---
+## 🎯 Features
+✅ **54 Unique Voices** - American & British accents
+✅ **Natural Prosody** - Human-like intonation
+✅ **Fast Generation** - 2-5 seconds per sentence
+✅ **Speed Control** - 0.5x to 2x playback
+✅ **High Quality** - StyleTTS 2 architecture
+✅ **Open Source** - Apache 2.0 license
+---
+## 💻 Technology Stack
+- **Backend**: Gradio + Hugging Face Inference API
+- **Model**: Kokoro-82M (hexgrad/Kokoro-82M)
+- **Architecture**: StyleTTS 2 + ISTFTNet
+- **Deployment**: Hugging Face Spaces
+---
+## 🚀 Usage
+1. **Choose Voice** - Select from 54 premium voices
+2. **Enter Text** - Type or paste your content
+3. **Adjust Speed** - Control playback rate (0.5x - 2x)
+4. **Generate** - Click to synthesize speech
+5. **Download** - Save audio as WAV file
+---
+## 📊 Comparison with Other Models
+| Feature | Kokoro-82M | SpeechT5 | VITS |
+|---------|-----------|----------|------|
+| **Voices** | 54 | 1 | Variable |
+| **Quality** | Excellent | Good | Good |
+| **Speed** | Fast | Medium | Fast |
+| **Accents** | US/UK | Generic | Variable |
+| **License** | Apache 2.0 | Apache 2.0 | MIT |
+---
+## 🎓 Credits
+- **Model**: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M)
+- **Base Architecture**: StyleTTS 2 by Li et al.
+- **Decoder**: ISTFTNet
+- **Training**: Ethical permissive-licensed data only
+---
+## 📝 License
+Apache 2.0 - Free for commercial use
+---
+## 🔗 Links
+- 📄 [Model Card](https://huggingface.co/hexgrad/Kokoro-82M)
+- 📜 [StyleTTS 2 Paper](https://arxiv.org/abs/2306.07691)
+- 🐙 [GitHub (ONNX)](https://github.com/thewh1teagle/kokoro-onnx)
 ---
+**Built with ❤️ using Kokoro-82M & Gradio**