FLENclone / README.md
PatnaikAshish's picture
Update README.md
240687c verified

A newer version of the Gradio SDK is available: 6.11.0

Upgrade
metadata
title: Kokoclone
emoji: 💻
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
python_version: 3.12.12
license: apache-2.0
short_description: Kokoro, But It Clones Voices Now

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

KokoClone

Hugging Face Space Hugging Face Models [Python] License

What is KokoClone?

KokoClone is a fast, real-time compatible multilingual voice cloning system built on top of Kokoro-ONNX, one of the fastest open-source neural TTS engines available today.

It allows you to:

  • Type text in multiple languages
  • Provide a short 3–10 second reference audio clip
  • Instantly generate speech in that same voice

Just text → voice → cloned output.

Why Kokoro?

KokoClone is powered by Kokoro-ONNX, a highly optimized neural TTS engine designed for:

  • Extremely fast inference
  • Natural prosody and expressive speech
  • Lightweight ONNX runtime compatibility
  • Real-time deployment on CPU
  • Even faster performance with GPU

Unlike many heavy TTS systems, Kokoro is lightweight and responsive — making KokoClone suitable for real-time applications, voice assistants, demos, and interactive tools.

Features

Multilingual Speech Generation

Generate native speech in:

  • English (en)
  • Hindi (hi)
  • French (fr)
  • Japanese (ja)
  • Chinese (zh)
  • Italian (it)
  • Portuguese (pt)
  • Spanish (es)

Zero-Shot Voice Cloning

Upload a short voice sample and KokoClone transfers its vocal characteristics to the generated speech.

Real-Time Friendly

Built on Kokoro’s efficient ONNX runtime pipeline, KokoClone runs smoothly on:

  • Standard laptops (CPU)
  • Workstations (GPU)

Automatic Model Handling

On first run, required model files are downloaded automatically and placed in the correct directories.

Built-in Web Interface

Includes a clean and responsive Gradio UI for quick testing and demos.

Live Demo

Try it instantly without installing anything:

👉 KokoClone on Hugging Face Spaces

Installation

Recommended: Use conda for a clean environment.

Clone the Repository

git clone https://github.com/Ashish-Patnaik/kokoclone.git
cd kokoclone

Create Environment

conda create -n kokoclone python=3.12.12 -y
conda activate kokoclone

Install Dependencies

CPU Installation (Recommended for most users)

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt

GPU Installation (NVIDIA users)

pip install -r requirements.txt
pip install kokoro-onnx[gpu]

Usage

KokoClone can be used in three ways:

Web Interface

Launch the Gradio app:

python app.py

Then open the browser interface to:

  • Enter text
  • Select language
  • Upload a reference voice
  • Generate cloned speech

Command Line

python cli.py --text "Hello from KokoClone" --lang en --ref reference.wav --out output.wav

Python API

from core.cloner import KokoClone

cloner = KokoClone()

cloner.generate(
    text="This voice is cloned using KokoClone.",
    lang="en",
    reference_audio="reference.wav",
    output_path="output.wav"
)

Project Structure

app.py              → Gradio Web Interface
cli.py              → Command-line tool
core/cloner.py      → Core inference engine
inference.py        → Example usage script
model/              → Downloaded TTS model weights
voice/              → Voice embeddings

Use Cases

  • Voice assistant prototypes
  • Real-time TTS demos
  • Multilingual narration tools
  • Content creation
  • Research experiments
  • Interactive AI applications

Acknowledgments

This project builds upon:

  • Kokoro-ONNX — for fast and efficient neural speech synthesis
  • Kanade Tokenizer — for voice conversion architecture

License

Licensed under the Apache 2.0 License.