--- license: apache-2.0 base_model: hexgrad/Kokoro-82M pipeline_tag: text-to-speech language: - en - es - fr - hi - it - ja - pt - zh tags: - text-to-speech - tts - kokoro - onnx - safetensors library_name: nobodywho --- # Kokoro v1 ## Model Capabilities - **Text-to-speech** — 24 kHz mono output - **Multilingual** — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese - **54 voices** across 9 languages, naming convention `_` (e.g. `af_heart`, `bm_george`, `jf_alpha`) The full description can be found at [the original model page](https://huggingface.co/hexgrad/Kokoro-82M). ## Getting Started Run with NobodyWho (the model is fetched and cached on first use): ```rust use nobodywho::tts::{Tts, TtsConfig}; let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?; let wav = tts.synthesize("Hello from NobodyWho!")?; std::fs::write("hello.wav", wav)?; ``` ## Benchmarks Measured with `nobodywho` on **Apple M4 Pro, CPU**: | Input | Audio | Wallclock | Real-time factor | |----------|-------|-----------|------------------| | 10 words | 3.3s | ~0.48s | **6.9×** | | 30 words | 10.9s | ~1.35s | **8.1×** | | 70 words | 18.7s | ~2.26s | **8.3×** | ## Credits Original model and training by [@hexgrad](https://huggingface.co/hexgrad). Thanks!