Kokoro-82M / README.md
krebsm's picture
Update README.md
0c59855 verified
|
Raw
History Blame Contribute Delete
1.41 kB
metadata
license: apache-2.0
base_model: hexgrad/Kokoro-82M
pipeline_tag: text-to-speech
language:
  - en
  - es
  - fr
  - hi
  - it
  - ja
  - pt
  - zh
tags:
  - text-to-speech
  - tts
  - kokoro
  - onnx
  - safetensors
library_name: nobodywho

Kokoro v1

Model Capabilities

  • Text-to-speech — 24 kHz mono output
  • Multilingual — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
  • 54 voices across 9 languages, naming convention <lang><gender>_<name> (e.g. af_heart, bm_george, jf_alpha)

The full description can be found at the original model page.

Getting Started

Run with NobodyWho (the model is fetched and cached on first use):

use nobodywho::tts::{Tts, TtsConfig};

let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
let wav = tts.synthesize("Hello from NobodyWho!")?;
std::fs::write("hello.wav", wav)?;

Benchmarks

Measured with nobodywho on Apple M4 Pro, CPU:

Input Audio Wallclock Real-time factor
10 words 3.3s ~0.48s 6.9×
30 words 10.9s ~1.35s 8.1×
70 words 18.7s ~2.26s 8.3×

Credits

Original model and training by @hexgrad. Thanks!