---
license: apache-2.0
base_model: hexgrad/Kokoro-82M
pipeline_tag: text-to-speech
language:
  - en
  - es
  - fr
  - hi
  - it
  - ja
  - pt
  - zh
tags:
  - text-to-speech
  - tts
  - kokoro
  - onnx
  - safetensors
library_name: nobodywho
---

# Kokoro v1

## Model Capabilities

- **Text-to-speech** — 24 kHz mono output
- **Multilingual** — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
- **54 voices** across 9 languages, naming convention `<lang><gender>_<name>` (e.g. `af_heart`, `bm_george`, `jf_alpha`)

The full description can be found at [the original model page](https://huggingface.co/hexgrad/Kokoro-82M).

## Getting Started

Run with NobodyWho (the model is fetched and cached on first use):
```rust
use nobodywho::tts::{Tts, TtsConfig};

let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
let wav = tts.synthesize("Hello from NobodyWho!")?;
std::fs::write("hello.wav", wav)?;
```

## Benchmarks

Measured with `nobodywho` on **Apple M4 Pro, CPU**:

| Input    | Audio | Wallclock | Real-time factor |
|----------|-------|-----------|------------------|
| 10 words |  3.3s |  ~0.48s   | **6.9×**         |
| 30 words | 10.9s |  ~1.35s   | **8.1×**         |
| 70 words | 18.7s |  ~2.26s   | **8.3×**         |

## Credits

Original model and training by [@hexgrad](https://huggingface.co/hexgrad). Thanks!