Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,46 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- hexgrad/Kokoro-82M
|
| 7 |
+
pipeline_tag: text-to-speech
|
| 8 |
+
---
|
| 9 |
+
**Qhash-TTS** is an open-weight TTS model with 84 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Qhash-TTS can be deployed anywhere from production environments to personal projects.
|
| 10 |
+
|
| 11 |
+
<audio controls><source src="https://huggingface.co/Quantamhash/Qhash-TTS/resolve/main/samples/HEARME.wav" type="audio/wav"></audio>
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
### Releases
|
| 15 |
+
|
| 16 |
+
| Model | Published | Training Data | Langs & Voices | SHA256 |
|
| 17 |
+
| ----- | --------- | ------------- | -------------- | ------ |
|
| 18 |
+
| **v1.0** | **2025 Jan 27** | **Few hundred hrs** | [**8 & 54**](https://huggingface.co/Quantamhash/Qhash-TTS/blob/main/VOICES.md) | `496dba11` |
|
| 19 |
+
| [v0.19] | 2024 Dec 25 | <100 hrs | 1 & 10 | `3b0c392f` |
|
| 20 |
+
|
| 21 |
+
| Training Costs | v0.19 | v1.0 | **Total** |
|
| 22 |
+
| -------------- | ----- | ---- | ----- |
|
| 23 |
+
| in A100 80GB GPU hours | 500 | 500 | **1000** |
|
| 24 |
+
| average hourly rate | $0.80/h | $1.20/h | **$1/h** |
|
| 25 |
+
| in USD | $400 | $600 | **$1000** |
|
| 26 |
+
|
| 27 |
+
### Usage
|
| 28 |
+
You can run this basic cell on [Google Colab](https://colab.research.google.com/). [Listen to samples](https://huggingface.co/Quantamhash/Qhash-TTS/blob/main/SAMPLES.md). For more languages and details, see [Advanced Usage](https://github.com/hexgrad/kokoro?tab=readme-ov-file#advanced-usage).
|
| 29 |
+
```py
|
| 30 |
+
!pip install -q kokoro>=0.9.2 soundfile
|
| 31 |
+
!apt-get -qq -y install espeak-ng > /dev/null 2>&1
|
| 32 |
+
from kokoro import KPipeline
|
| 33 |
+
from IPython.display import display, Audio
|
| 34 |
+
import soundfile as sf
|
| 35 |
+
import torch
|
| 36 |
+
pipeline = KPipeline(lang_code='a')
|
| 37 |
+
text = '''
|
| 38 |
+
Qhash is an open-weight TTS model with 84 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Qhash-TTS can be deployed anywhere from production environments to personal projects.
|
| 39 |
+
'''
|
| 40 |
+
generator = pipeline(text, voice='af_heart')
|
| 41 |
+
for i, (gs, ps, audio) in enumerate(generator):
|
| 42 |
+
print(i, gs, ps)
|
| 43 |
+
display(Audio(data=audio, rate=24000, autoplay=i==0))
|
| 44 |
+
sf.write(f'{i}.wav', audio, 24000)
|
| 45 |
+
```
|
| 46 |
+
Under the hood, `Qhash-TTS` uses [`misaki`](https://pypi.org/project/misaki/), a G2P library at https://github.com/hexgrad/misaki
|