NobodyWho
/

Kokoro-82M

Model card Files Files and versions

Kokoro-82M / README.md

krebsm's picture

Update README.md

0c59855 verified about 2 months ago

|

History Blame Contribute Delete

1.41 kB

	---
	license: apache-2.0
	base_model: hexgrad/Kokoro-82M
	pipeline_tag: text-to-speech
	language:
	- en
	- es
	- fr
	- hi
	- it
	- ja
	- pt
	- zh
	tags:
	- text-to-speech
	- tts
	- kokoro
	- onnx
	- safetensors
	library_name: nobodywho
	---

	# Kokoro v1

	## Model Capabilities

	- Text-to-speech — 24 kHz mono output
	- Multilingual — American/British English, Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, Mandarin Chinese
	- 54 voices across 9 languages, naming convention `<lang><gender>_<name>` (e.g. `af_heart`, `bm_george`, `jf_alpha`)

	The full description can be found at [the original model page](https://huggingface.co/hexgrad/Kokoro-82M).

	## Getting Started

	Run with NobodyWho (the model is fetched and cached on first use):
	```rust
	use nobodywho::tts::{Tts, TtsConfig};

	let tts = Tts::new(TtsConfig::kokoro("NobodyWho/kokoro-v1"))?;
	let wav = tts.synthesize("Hello from NobodyWho!")?;
	std::fs::write("hello.wav", wav)?;
	```

	## Benchmarks

	Measured with `nobodywho` on Apple M4 Pro, CPU:

	\| Input \| Audio \| Wallclock \| Real-time factor \|
	\|----------\|-------\|-----------\|------------------\|
	\| 10 words \| 3.3s \| ~0.48s \| 6.9× \|
	\| 30 words \| 10.9s \| ~1.35s \| 8.1× \|
	\| 70 words \| 18.7s \| ~2.26s \| 8.3× \|

	## Credits

	Original model and training by [@hexgrad](https://huggingface.co/hexgrad). Thanks!