| --- |
| license: apache-2.0 |
| base_model: hexgrad/Kokoro-82M |
| pipeline_tag: text-to-speech |
| language: |
| - en |
| - es |
| - fr |
| - hi |
| - it |
| - ja |
| - pt |
| - zh |
| tags: |
| - text-to-speech |
| - tts |
| - kokoro |
| - onnx |
| - safetensors |
| library_name: nobodywho |
| --- |
| |
| # Kokoro v1 |
|
|
| ## Model Capabilities |
|
|
| - **Text-to-speech** — 24 kHz mono output |
| - **Multilingual** — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese |
| - **54 voices** across 9 languages, naming convention `<lang><gender>_<name>` (e.g. `af_heart`, `bm_george`, `jf_alpha`) |
|
|
| The full description can be found at [the original model page](https://huggingface.co/hexgrad/Kokoro-82M). |
|
|
| ## Getting Started |
|
|
| Run with NobodyWho (the model is fetched and cached on first use): |
| ```rust |
| use nobodywho::tts::{Tts, TtsConfig}; |
| |
| let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?; |
| let wav = tts.synthesize("Hello from NobodyWho!")?; |
| std::fs::write("hello.wav", wav)?; |
| ``` |
|
|
| ## Benchmarks |
|
|
| Measured with `nobodywho` on **Apple M4 Pro, CPU**: |
|
|
| | Input | Audio | Wallclock | Real-time factor | |
| |----------|-------|-----------|------------------| |
| | 10 words | 3.3s | ~0.48s | **6.9×** | |
| | 30 words | 10.9s | ~1.35s | **8.1×** | |
| | 70 words | 18.7s | ~2.26s | **8.3×** | |
|
|
| ## Credits |
|
|
| Original model and training by [@hexgrad](https://huggingface.co/hexgrad). Thanks! |
|
|