metadata
license: apache-2.0
language:
- en
base_model:
- hexgrad/Kokoro-82M
pipeline_tag: text-to-speech
Releases
| Model | Published | Training Data | Langs & Voices | SHA256 |
|---|---|---|---|---|
| v1.0 | 2025 Jan 27 | Few hundred hrs | 8 & 54 | 496dba11 |
| [v0.19] | 2024 Dec 25 | <100 hrs | 1 & 10 | 3b0c392f |
| Training Costs | v0.19 | v1.0 | Total |
|---|---|---|---|
| in A100 80GB GPU hours | 500 | 500 | 1000 |
| average hourly rate | $0.80/h | $1.20/h | $1/h |
| in USD | $400 | $600 | $1000 |
Usage
You can run this basic cell on Google Colab. Listen to samples. For more languages and details, see Advanced Usage.
!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
Qhash is an open-weight TTS model with 84 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Qhash-TTS can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
print(i, gs, ps)
display(Audio(data=audio, rate=24000, autoplay=i==0))
sf.write(f'{i}.wav', audio, 24000)
Under the hood, Qhash-TTS uses misaki, a G2P library at https://github.com/hexgrad/misaki