FLENclone

Sleeping

File size: 4,535 Bytes

---
title: Kokoclone
emoji: 💻
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
python_version: 3.12.12
license: apache-2.0
short_description: Kokoro, But It Clones Voices Now
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# KokoClone

[![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Live%20Demo-blue)](https://huggingface.co/spaces/PatnaikAshish/kokoclone)
[![Hugging Face Models](https://img.shields.io/badge/🤗%20Models-Repository-orange)](https://huggingface.co/PatnaikAshish/kokoclone)
[![Python](https://img.shields.io/badge/Python-3.10+-3776AB.svg?logo=python\&logoColor=white)]
[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://opensource.org/licenses/Apache-2.0)



##  What is KokoClone?

**KokoClone** is a fast, real-time compatible multilingual voice cloning system built on top of **Kokoro-ONNX**, one of the fastest open-source neural TTS engines available today.

It allows you to:

* Type text in multiple languages
* Provide a short 3–10 second reference audio clip
* Instantly generate speech in that same voice


Just text → voice → cloned output.


##  Why Kokoro?

KokoClone is powered by **Kokoro-ONNX**, a highly optimized neural TTS engine designed for:

*  Extremely fast inference
*  Natural prosody and expressive speech
*  Lightweight ONNX runtime compatibility
*  Real-time deployment on CPU
*  Even faster performance with GPU

Unlike many heavy TTS systems, Kokoro is lightweight and responsive — making KokoClone suitable for real-time applications, voice assistants, demos, and interactive tools.


##  Features

###  Multilingual Speech Generation

Generate native speech in:

* English (`en`)
* Hindi (`hi`)
* French (`fr`)
* Japanese (`ja`)
* Chinese (`zh`)
* Italian (`it`)
* Portuguese (`pt`)
* Spanish (`es`)


###  Zero-Shot Voice Cloning

Upload a short voice sample and KokoClone transfers its vocal characteristics to the generated speech.


###  Real-Time Friendly

Built on Kokoro’s efficient ONNX runtime pipeline, KokoClone runs smoothly on:

* Standard laptops (CPU)
* Workstations (GPU)


###  Automatic Model Handling

On first run, required model files are downloaded automatically and placed in the correct directories.


###  Built-in Web Interface

Includes a clean and responsive Gradio UI for quick testing and demos.



##  Live Demo

Try it instantly without installing anything:

👉 **[KokoClone on Hugging Face Spaces](https://huggingface.co/spaces/PatnaikAshish/kokoclone)**



##  Installation

Recommended: Use `conda` for a clean environment.

###  Clone the Repository

```bash
git clone https://github.com/Ashish-Patnaik/kokoclone.git
cd kokoclone
```

###  Create Environment

```bash
conda create -n kokoclone python=3.12.12 -y
conda activate kokoclone
```



##  Install Dependencies

###  CPU Installation (Recommended for most users)

```bash
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
```

###  GPU Installation (NVIDIA users)

```bash
pip install -r requirements.txt
pip install kokoro-onnx[gpu]
```



##  Usage

KokoClone can be used in three ways:



###  Web Interface

Launch the Gradio app:

```bash
python app.py
```

Then open the browser interface to:

* Enter text
* Select language
* Upload a reference voice
* Generate cloned speech



###  Command Line

```bash
python cli.py --text "Hello from KokoClone" --lang en --ref reference.wav --out output.wav
```



###  Python API

```python
from core.cloner import KokoClone

cloner = KokoClone()

cloner.generate(
    text="This voice is cloned using KokoClone.",
    lang="en",
    reference_audio="reference.wav",
    output_path="output.wav"
)
```



##  Project Structure

```
app.py              → Gradio Web Interface
cli.py              → Command-line tool
core/cloner.py      → Core inference engine
inference.py        → Example usage script
model/              → Downloaded TTS model weights
voice/              → Voice embeddings
```



##  Use Cases

* Voice assistant prototypes
* Real-time TTS demos
* Multilingual narration tools
* Content creation
* Research experiments
* Interactive AI applications



##  Acknowledgments

This project builds upon:

* **Kokoro-ONNX** — for fast and efficient neural speech synthesis
* **Kanade Tokenizer** — for voice conversion architecture


##  License

Licensed under the Apache 2.0 License.