Spaces:
Sleeping
Sleeping
File size: 4,535 Bytes
85ff424 24f8f79 85ff424 240687c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | ---
title: Kokoclone
emoji: 💻
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
python_version: 3.12.12
license: apache-2.0
short_description: Kokoro, But It Clones Voices Now
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# KokoClone
[](https://huggingface.co/spaces/PatnaikAshish/kokoclone)
[](https://huggingface.co/PatnaikAshish/kokoclone)
[]
[](https://opensource.org/licenses/Apache-2.0)
## What is KokoClone?
**KokoClone** is a fast, real-time compatible multilingual voice cloning system built on top of **Kokoro-ONNX**, one of the fastest open-source neural TTS engines available today.
It allows you to:
* Type text in multiple languages
* Provide a short 3–10 second reference audio clip
* Instantly generate speech in that same voice
Just text → voice → cloned output.
## Why Kokoro?
KokoClone is powered by **Kokoro-ONNX**, a highly optimized neural TTS engine designed for:
* Extremely fast inference
* Natural prosody and expressive speech
* Lightweight ONNX runtime compatibility
* Real-time deployment on CPU
* Even faster performance with GPU
Unlike many heavy TTS systems, Kokoro is lightweight and responsive — making KokoClone suitable for real-time applications, voice assistants, demos, and interactive tools.
## Features
### Multilingual Speech Generation
Generate native speech in:
* English (`en`)
* Hindi (`hi`)
* French (`fr`)
* Japanese (`ja`)
* Chinese (`zh`)
* Italian (`it`)
* Portuguese (`pt`)
* Spanish (`es`)
### Zero-Shot Voice Cloning
Upload a short voice sample and KokoClone transfers its vocal characteristics to the generated speech.
### Real-Time Friendly
Built on Kokoro’s efficient ONNX runtime pipeline, KokoClone runs smoothly on:
* Standard laptops (CPU)
* Workstations (GPU)
### Automatic Model Handling
On first run, required model files are downloaded automatically and placed in the correct directories.
### Built-in Web Interface
Includes a clean and responsive Gradio UI for quick testing and demos.
## Live Demo
Try it instantly without installing anything:
👉 **[KokoClone on Hugging Face Spaces](https://huggingface.co/spaces/PatnaikAshish/kokoclone)**
## Installation
Recommended: Use `conda` for a clean environment.
### Clone the Repository
```bash
git clone https://github.com/Ashish-Patnaik/kokoclone.git
cd kokoclone
```
### Create Environment
```bash
conda create -n kokoclone python=3.12.12 -y
conda activate kokoclone
```
## Install Dependencies
### CPU Installation (Recommended for most users)
```bash
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
```
### GPU Installation (NVIDIA users)
```bash
pip install -r requirements.txt
pip install kokoro-onnx[gpu]
```
## Usage
KokoClone can be used in three ways:
### Web Interface
Launch the Gradio app:
```bash
python app.py
```
Then open the browser interface to:
* Enter text
* Select language
* Upload a reference voice
* Generate cloned speech
### Command Line
```bash
python cli.py --text "Hello from KokoClone" --lang en --ref reference.wav --out output.wav
```
### Python API
```python
from core.cloner import KokoClone
cloner = KokoClone()
cloner.generate(
text="This voice is cloned using KokoClone.",
lang="en",
reference_audio="reference.wav",
output_path="output.wav"
)
```
## Project Structure
```
app.py → Gradio Web Interface
cli.py → Command-line tool
core/cloner.py → Core inference engine
inference.py → Example usage script
model/ → Downloaded TTS model weights
voice/ → Voice embeddings
```
## Use Cases
* Voice assistant prototypes
* Real-time TTS demos
* Multilingual narration tools
* Content creation
* Research experiments
* Interactive AI applications
## Acknowledgments
This project builds upon:
* **Kokoro-ONNX** — for fast and efficient neural speech synthesis
* **Kanade Tokenizer** — for voice conversion architecture
## License
Licensed under the Apache 2.0 License.
|