Major Update: Kokoro-82M with 54 Premium Voices

#6
by masbudjj - opened
Files changed (1) hide show
  1. README.md +113 -56
README.md CHANGED
@@ -1,79 +1,136 @@
1
  ---
2
- title: TTS Browser Demo - Transformers.js
3
  emoji: πŸŽ™οΈ
4
- colorFrom: blue
5
- colorTo: indigo
6
- sdk: static
 
 
7
  pinned: false
 
8
  ---
9
 
10
- # πŸŽ™οΈ Text-to-Speech Browser Demo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- Demo **TTS (Text-to-Speech)** yang berjalan **100% di browser** menggunakan **Transformers.js** dari Hugging Face.
13
- Tidak perlu server Python, tidak ada biaya hosting!
14
 
15
- ## ✨ Fitur Lengkap
16
 
17
- ### πŸŽ™οΈ Model TTS (3 Pilihan)
18
- - **SpeechT5** (Fast) - Model cepat untuk testing (`Xenova/speecht5_tts`)
19
- - **SpeechT5 VCTK HiFi** (Best Quality) - Kualitas audio tertinggi (`Xenova/speecht5_tts_vctk_hifi`)
20
- - **MMS English** (Meta) - Model multilingual Meta (`Xenova/mms-tts-eng`)
 
 
21
 
22
- ### 🎚️ Voice Controls (Semua Berfungsi!)
23
- - **Speed Control** (0.5x - 2x) - Real-time playback speed adjustment
24
- - **Temperature** (0.1 - 1.5) - Kontrol kreativitas output
25
- - **Top P Sampling** (0.01 - 1.0) - Nucleus sampling untuk variasi natural
26
- - **Top K** (0-50) - Token selection control
27
- - **Repetition Penalty** (0.8 - 2.0) - Hindari pengulangan kata
28
- - **Length Penalty** (0.1 - 2.0) - Kontrol panjang audio
29
- - **Num Beams** (1-8) - Beam search untuk kualitas lebih baik
30
 
31
- ### 🎀 Speaker Voice Cloning
32
- - Upload audio file untuk clone karakteristik suara
33
- - Support semua format audio (MP3, WAV, M4A, dll)
34
- - Processing otomatis speaker embeddings
35
 
36
- ### πŸ’» Teknologi
37
- - ⚑ **100% Client-Side** - Zero server dependency
38
- - πŸš€ **WebGPU Acceleration** - Auto-detect & fallback ke WASM
39
- - πŸ’Ύ **Smart Caching** - Model di-cache setelah download pertama
40
- - πŸ“Š **Real-time Logging** - Activity log dengan timestamp
41
- - 🎨 **Modern UI** - Dark theme, glassmorphism, smooth animations
42
- - πŸ“± **Fully Responsive** - Works on mobile, tablet, desktop
43
 
44
- ## πŸ“– Cara Pakai
45
 
46
- 1. **Duplicate Space** ini atau clone repository
47
- 2. Buka URL Space, tunggu model loading (pertama kali akan download ONNX weights)
48
- 3. **Pilih Model** dari dropdown di panel kanan
49
- 4. Ketik teks yang ingin diubah jadi suara
50
- 5. Klik **Generate**
51
- 6. Audio akan muncul dengan tombol **Download**
52
 
53
- ## πŸ› οΈ Teknologi
 
 
 
54
 
55
- - [Transformers.js](https://huggingface.co/docs/transformers.js) v3.x
56
- - Vanilla JavaScript (ES6 Modules)
57
- - ONNX Runtime (WASM/WebGPU)
58
 
59
- ## πŸ“ Catatan
60
 
61
- - Beberapa kontrol UI (emotion vector, speaker prompt) adalah placeholder untuk ekspansi fitur di masa depan
62
- - Model akan di-cache di browser setelah download pertama
63
- - Gunakan browser modern (Chrome, Edge, Firefox) untuk performa optimal
 
 
64
 
65
- ## πŸš€ Deploy Sendiri
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- ```bash
68
- # Clone repository
69
- git clone <your-repo-url>
70
 
71
- # Deploy ke Hugging Face Spaces
72
- # 1. Buat Space baru di huggingface.co/spaces
73
- # 2. Pilih "Static" sebagai SDK
74
- # 3. Upload semua file atau connect Git repository
75
- ```
76
 
77
  ---
78
 
79
- **Template ini siap untuk production!** πŸŽ‰
 
1
  ---
2
+ title: Kokoro-82M TTS - 54 Premium Voices
3
  emoji: πŸŽ™οΈ
4
+ colorFrom: indigo
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
  ---
12
 
13
+ # πŸŽ™οΈ Kokoro-82M Text-to-Speech
14
+
15
+ **World-Class TTS with 54 Premium Voices**
16
+
17
+ ## ✨ Features
18
+
19
+ ### 🎭 54 Premium Voices
20
+
21
+ #### πŸ‡ΊπŸ‡Έ American English (19 voices)
22
+ **Female (11 voices):**
23
+ - Heart - Warm & Friendly
24
+ - Bella - Elegant & Smooth
25
+ - Nicole - Professional
26
+ - Aoede - Cheerful
27
+ - Kore - Gentle
28
+ - Sarah - Clear
29
+ - Nova - Modern
30
+ - Sky - Light
31
+ - Alloy - Versatile
32
+ - Jessica - Natural
33
+ - River - Calm
34
+
35
+ **Male (8 voices):**
36
+ - Michael - Deep & Authoritative
37
+ - Fenrir - Strong
38
+ - Puck - Playful
39
+ - Echo - Resonant
40
+ - Eric - Professional
41
+ - Liam - Friendly
42
+ - Onyx - Rich
43
+ - Adam - Natural
44
+
45
+ #### πŸ‡¬πŸ‡§ British English (8 voices)
46
+ **Female (4 voices):**
47
+ - Emma - Refined
48
+ - Isabella - Elegant
49
+ - Alice - Clear
50
+ - Lily - Soft
51
+
52
+ **Male (4 voices):**
53
+ - George - Distinguished
54
+ - Fable - Storyteller
55
+ - Lewis - Smooth
56
+ - Daniel - Professional
57
 
58
+ ---
 
59
 
60
+ ## πŸ—οΈ Model Architecture
61
 
62
+ **Kokoro-82M** based on **StyleTTS 2**:
63
+ - **Parameters**: 82 Million
64
+ - **Decoder**: ISTFTNet
65
+ - **Training**: Few hundred hours of permissive data
66
+ - **License**: Apache 2.0
67
+ - **Paper**: [StyleTTS 2 (arxiv.org/abs/2306.07691)](https://arxiv.org/abs/2306.07691)
68
 
69
+ ---
 
 
 
 
 
 
 
70
 
71
+ ## 🎯 Features
 
 
 
72
 
73
+ βœ… **54 Unique Voices** - American & British accents
74
+ βœ… **Natural Prosody** - Human-like intonation
75
+ βœ… **Fast Generation** - 2-5 seconds per sentence
76
+ βœ… **Speed Control** - 0.5x to 2x playback
77
+ βœ… **High Quality** - StyleTTS 2 architecture
78
+ βœ… **Open Source** - Apache 2.0 license
 
79
 
80
+ ---
81
 
82
+ ## πŸ’» Technology Stack
 
 
 
 
 
83
 
84
+ - **Backend**: Gradio + Hugging Face Inference API
85
+ - **Model**: Kokoro-82M (hexgrad/Kokoro-82M)
86
+ - **Architecture**: StyleTTS 2 + ISTFTNet
87
+ - **Deployment**: Hugging Face Spaces
88
 
89
+ ---
 
 
90
 
91
+ ## πŸš€ Usage
92
 
93
+ 1. **Choose Voice** - Select from 54 premium voices
94
+ 2. **Enter Text** - Type or paste your content
95
+ 3. **Adjust Speed** - Control playback rate (0.5x - 2x)
96
+ 4. **Generate** - Click to synthesize speech
97
+ 5. **Download** - Save audio as WAV file
98
 
99
+ ---
100
+
101
+ ## πŸ“Š Comparison with Other Models
102
+
103
+ | Feature | Kokoro-82M | SpeechT5 | VITS |
104
+ |---------|-----------|----------|------|
105
+ | **Voices** | 54 | 1 | Variable |
106
+ | **Quality** | Excellent | Good | Good |
107
+ | **Speed** | Fast | Medium | Fast |
108
+ | **Accents** | US/UK | Generic | Variable |
109
+ | **License** | Apache 2.0 | Apache 2.0 | MIT |
110
+
111
+ ---
112
+
113
+ ## πŸŽ“ Credits
114
+
115
+ - **Model**: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M)
116
+ - **Base Architecture**: StyleTTS 2 by Li et al.
117
+ - **Decoder**: ISTFTNet
118
+ - **Training**: Ethical permissive-licensed data only
119
+
120
+ ---
121
+
122
+ ## πŸ“ License
123
+
124
+ Apache 2.0 - Free for commercial use
125
+
126
+ ---
127
 
128
+ ## πŸ”— Links
 
 
129
 
130
+ - πŸ“„ [Model Card](https://huggingface.co/hexgrad/Kokoro-82M)
131
+ - πŸ“œ [StyleTTS 2 Paper](https://arxiv.org/abs/2306.07691)
132
+ - πŸ™ [GitHub (ONNX)](https://github.com/thewh1teagle/kokoro-onnx)
 
 
133
 
134
  ---
135
 
136
+ **Built with ❀️ using Kokoro-82M & Gradio**