Buckets:

krcv
/

mbc

Files

xet

krcv/mbc / README.md

krcv

12 days ago

preview code

download

raw

4.6 kB

	---
	license: apache-2.0
	pipeline_tag: text-to-speech
	library_name: ZONOS2
	---

	# ZONOS2

	<p align="center">
	<img src="./assets/ZONOS2BlogThumbnail.png" alt="ZONOS2 title card" width="750" />
	</p>

	<div align="center">
	<a href="https://discord.gg/gTW9JwST8q" target="_blank">
	<img src="https://img.shields.io/badge/Join%20Our%20Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white" alt="Discord">
	</a>
	</div>

	---


	ZONOS2 is our latest text-to-speech model trained on more than 6 million hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers at low latency with MoE. ZONOS2 excels at high-fidelity and naturalistic voice cloning.


	During inference we use nemo TN normalized UTF-8 bytes and an ECAPA-TDNN embedding to generate DAC tokens with our MoE backbone. An inference overview can be seen below.
	<p align="center">
	<img src="./assets/zonos2_arlooop_animated.gif" alt="ZONOS2 title card" width="750" />
	</p>

	Language support is as follows.
	\| Tier \| Languages \|
	\| ------ \| ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- \|
	\| Tier 1 \| English, Mandarin Chinese, Japanese \|
	\| Tier 2 \| Korean, Russian, Italian, Portuguese, French, Spanish, Vietnamese, German, Hebrew, Dutch \|
	\| Tier 3 \| Swedish, Hindi, Tamil, Telugu, Thai, Norwegian, Bengali, Tagalog, Arabic, Danish, Indonesian, Polish, Ukrainian, Romanian, Finnish, Hungarian, Lithuanian, Estonian, Slovak, Croatian, Latvian \|


	For local inference we provide a high-performance TTS inference server built on [Mini-SGLang](https://github.com/sgl-project/mini-sglang).

	For more details and speech samples, check out our [blog](https://www.zyphra.com/our-work/zonos2).

	We also have a hosted version available at [cloud.zyphra.com/audio-playground](https://cloud.zyphra.com/audio-playground).

	---

	## Quick Start

	> Platform Support: Linux only (x86_64). Requires NVIDIA GPU with CUDA toolkit matching your driver version (`nvidia-smi` to check).

	### 1. Installation

	Requires [uv](https://docs.astral.sh/uv/getting-started/installation/).

	```bash
	git clone https://github.com/Zyphra/ZONOS2.git
	cd ZONOS2
	uv sync
	```

	### 2. Launch the TTS Server

	```bash
	uv run python -m minisgl --model-path Zyphra/ZONOS2 --tts-default-voices-dir ./default_voices/
	```

	`uv run` always uses the project environment, so no venv activation is needed.

	The server starts on `http://localhost:1919` by default. TTS mode is auto-detected for zonos2 models.
	`--tts-default-voices-dir <folder>` pre-populates the web UI with voice-clone
	speakers from disk; the folder is scanned recursively for speaker audio
	(`.wav`, `.mp3`, `.flac`, `.m4a`, `.ogg`, `.opus`, `.aac`, `.webm`) and saved
	embeddings (`.npy`, `.npz`). The newest voice is selected automatically on
	startup.

	### 3. Generate Speech

	curl:

	```bash
	curl -X POST http://localhost:1919/tts/generate \
	-H "Content-Type: application/json" \
	-d '{"text": "Hello world", "stream": true}' \
	--output output.pcm

	# Convert to WAV
	ffmpeg -f f32le -ar 44100 -ac 1 -i output.pcm output.wav
	```

	Web UI: Open `http://localhost:1919/` in your browser.

	## Python API (offline inference)

	You can also run the engine directly in a Python script, without starting a
	server, via `TTSLLM`:

	```python
	from minisgl.message import TTSSamplingParams
	from minisgl.tts import TTSLLM

	tts = TTSLLM(model_path="Zyphra/ZONOS2")

	results = tts.generate(
	["Hello from the offline Python API.", "Batched prompts work too."],
	TTSSamplingParams(seed=42),
	)

	for i, result in enumerate(results):
	print(f"frames={len(result['audio_tokens'])}, eos_frame={result['eos_frame']}")
	tts.save_audio(result["audio"], f"output_{i}.wav")
	```


	## Citation
	If you find this model useful in an academic context please cite as:
	```
	@misc{zyphra2025zonos,
	title = {Zonos V2 Technical Report},
	author = {Gabriel Clark, Sofian Mejjoute, Mohamed Osman, George Close, Beren Millidge},
	year = {2026},
	}
	```

Xet Storage Details

Size:: 4.6 kB
Xet hash:: dbac180f5363d798d6177236bd5e60b6df217685d55223a98690e96d8af0b162

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.