Qhash-TTS / README.md

sbapan41

Update README.md

28dcaf3 verified 7 months ago

preview code

raw

history blame contribute delete

2.62 kB

metadata

license: apache-2.0
language:
  - en
base_model:
  - hexgrad/Kokoro-82M
pipeline_tag: text-to-speech

**Qhash-TTS** is an open-weight TTS model with 84 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Qhash-TTS can be deployed anywhere from production environments to personal projects.

Releases

Model	Published	Training Data	Langs & Voices	SHA256
v1.0	2025 Jan 27	Few hundred hrs	8 & 54	`496dba11`
[v0.19]	2024 Dec 25	<100 hrs	1 & 10	`3b0c392f`

Training Costs	v0.19	v1.0	Total
in A100 80GB GPU hours	500	500	1000
average hourly rate	$0.80/h	$1.20/h	$1/h
in USD	$400	$600	$1000

Usage

You can run this basic cell on Google Colab. Listen to samples. For more languages and details, see Advanced Usage.

!pip install -q kokoro>=0.9.2 soundfile
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
from kokoro import KPipeline
from IPython.display import display, Audio
import soundfile as sf
import torch
pipeline = KPipeline(lang_code='a')
text = '''
 Qhash is an open-weight TTS model with 84 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Qhash-TTS can be deployed anywhere from production environments to personal projects.
'''
generator = pipeline(text, voice='af_heart')
for i, (gs, ps, audio) in enumerate(generator):
    print(i, gs, ps)
    display(Audio(data=audio, rate=24000, autoplay=i==0))
    sf.write(f'{i}.wav', audio, 24000)

Under the hood, Qhash-TTS uses misaki, a G2P library at https://github.com/hexgrad/misaki