Spaces:
Running
A newer version of the Gradio SDK is available: 6.11.0
title: Kokoclone
emoji: 💻
colorFrom: blue
colorTo: pink
sdk: gradio
sdk_version: 6.8.0
app_file: app.py
pinned: false
python_version: 3.12.12
license: apache-2.0
short_description: Kokoro, But It Clones Voices Now
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
KokoClone
What is KokoClone?
KokoClone is a fast, real-time compatible multilingual voice cloning system built on top of Kokoro-ONNX, one of the fastest open-source neural TTS engines available today.
It allows you to:
- Type text in multiple languages
- Provide a short 3–10 second reference audio clip
- Instantly generate speech in that same voice
Just text → voice → cloned output.
Why Kokoro?
KokoClone is powered by Kokoro-ONNX, a highly optimized neural TTS engine designed for:
- Extremely fast inference
- Natural prosody and expressive speech
- Lightweight ONNX runtime compatibility
- Real-time deployment on CPU
- Even faster performance with GPU
Unlike many heavy TTS systems, Kokoro is lightweight and responsive — making KokoClone suitable for real-time applications, voice assistants, demos, and interactive tools.
Features
Multilingual Speech Generation
Generate native speech in:
- English (
en) - Hindi (
hi) - French (
fr) - Japanese (
ja) - Chinese (
zh) - Italian (
it) - Portuguese (
pt) - Spanish (
es)
Zero-Shot Voice Cloning
Upload a short voice sample and KokoClone transfers its vocal characteristics to the generated speech.
Real-Time Friendly
Built on Kokoro’s efficient ONNX runtime pipeline, KokoClone runs smoothly on:
- Standard laptops (CPU)
- Workstations (GPU)
Automatic Model Handling
On first run, required model files are downloaded automatically and placed in the correct directories.
Built-in Web Interface
Includes a clean and responsive Gradio UI for quick testing and demos.
Live Demo
Try it instantly without installing anything:
👉 KokoClone on Hugging Face Spaces
Installation
Recommended: Use conda for a clean environment.
Clone the Repository
git clone https://github.com/Ashish-Patnaik/kokoclone.git
cd kokoclone
Create Environment
conda create -n kokoclone python=3.12.12 -y
conda activate kokoclone
Install Dependencies
CPU Installation (Recommended for most users)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
GPU Installation (NVIDIA users)
pip install -r requirements.txt
pip install kokoro-onnx[gpu]
Usage
KokoClone can be used in three ways:
Web Interface
Launch the Gradio app:
python app.py
Then open the browser interface to:
- Enter text
- Select language
- Upload a reference voice
- Generate cloned speech
Command Line
python cli.py --text "Hello from KokoClone" --lang en --ref reference.wav --out output.wav
Python API
from core.cloner import KokoClone
cloner = KokoClone()
cloner.generate(
text="This voice is cloned using KokoClone.",
lang="en",
reference_audio="reference.wav",
output_path="output.wav"
)
Project Structure
app.py → Gradio Web Interface
cli.py → Command-line tool
core/cloner.py → Core inference engine
inference.py → Example usage script
model/ → Downloaded TTS model weights
voice/ → Voice embeddings
Use Cases
- Voice assistant prototypes
- Real-time TTS demos
- Multilingual narration tools
- Content creation
- Research experiments
- Interactive AI applications
Acknowledgments
This project builds upon:
- Kokoro-ONNX — for fast and efficient neural speech synthesis
- Kanade Tokenizer — for voice conversion architecture
License
Licensed under the Apache 2.0 License.