metadata
license: apache-2.0
base_model: hexgrad/Kokoro-82M
pipeline_tag: text-to-speech
language:
- en
- es
- fr
- hi
- it
- ja
- pt
- zh
tags:
- text-to-speech
- tts
- kokoro
- onnx
- safetensors
library_name: nobodywho
Kokoro v1
Model Capabilities
- Text-to-speech — 24 kHz mono output
- Multilingual — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
- 54 voices across 9 languages, naming convention
<lang><gender>_<name>(e.g.af_heart,bm_george,jf_alpha)
The full description can be found at the original model page.
Getting Started
Run with NobodyWho (the model is fetched and cached on first use):
use nobodywho::tts::{Tts, TtsConfig};
let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
let wav = tts.synthesize("Hello from NobodyWho!")?;
std::fs::write("hello.wav", wav)?;
Benchmarks
Measured with nobodywho on Apple M4 Pro, CPU:
| Input | Audio | Wallclock | Real-time factor |
|---|---|---|---|
| 10 words | 3.3s | ~0.48s | 6.9× |
| 30 words | 10.9s | ~1.35s | 8.1× |
| 70 words | 18.7s | ~2.26s | 8.3× |
Credits
Original model and training by @hexgrad. Thanks!