Kokoro-82M / README.md

krebsm

Update README.md

0c59855 verified about 2 months ago

preview code

Raw

History Blame Contribute Delete

1.41 kB

metadata

license: apache-2.0
base_model: hexgrad/Kokoro-82M
pipeline_tag: text-to-speech
language:
  - en
  - es
  - fr
  - hi
  - it
  - ja
  - pt
  - zh
tags:
  - text-to-speech
  - tts
  - kokoro
  - onnx
  - safetensors
library_name: nobodywho

Kokoro v1

Model Capabilities

Text-to-speech — 24 kHz mono output
Multilingual — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
54 voices across 9 languages, naming convention <lang><gender>_<name> (e.g. af_heart, bm_george, jf_alpha)

The full description can be found at the original model page.

Getting Started

Run with NobodyWho (the model is fetched and cached on first use):

use nobodywho::tts::{Tts, TtsConfig};

let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
let wav = tts.synthesize("Hello from NobodyWho!")?;
std::fs::write("hello.wav", wav)?;

Benchmarks

Measured with nobodywho on Apple M4 Pro, CPU:

Input	Audio	Wallclock	Real-time factor
10 words	3.3s	~0.48s	6.9×
30 words	10.9s	~1.35s	8.1×
70 words	18.7s	~2.26s	8.3×

Credits

Original model and training by @hexgrad. Thanks!