How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="cagyirey/ZONOS2-GGUF",
	filename="",
)
output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

ZONOS2-GGUF

GGUF builds of Zyphra/ZONOS2 โ€” an ~8B-parameter (SonicMoE, ~900M active) real-time text-to-speech model with voice cloning โ€” for mistral.rs.

Files

File Size Use
ZONOS2-Q8_0.gguf 8.27 GB Recommended. Near-lossless and the smallest coherent quantization.
ZONOS2-F16.gguf 15.3 GB Full precision, for maximum fidelity / reference.

Quantization: Q8_0 is the floor for this architecture โ€” smaller quants (Q6_K and below) are incoherent, because ZONOS2's SonicMoE residual structure amplifies low-bit error. Use Q8_0 or F16.

Usage (mistral.rs)

Serve the model:

mistralrs-server -p 8080 speech --arch zonos2 --model-id ZONOS2-Q8_0.gguf

Generate speech via the OpenAI-compatible /v1/audio/speech endpoint. For voice cloning, pass a speaker embedding:

curl localhost:8080/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
        "model": "default",
        "input": "Your text here.",
        "speaker_embedding": [ /* 2048-dim speaker vector */ ],
        "response_format": "wav"
      }' --output out.wav
  • speaker_embedding โ€” the [2048]-dim speaker vector (e.g. from an ECAPA speaker encoder). Omit it for unconditional generation.
  • Tip: punctuate naturally โ€” short clauses help prosody and clean utterance termination.

Model

  • Architecture: ZONOS2 (Zyphra) โ€” SonicMoE decoder + DAC neural codec.
  • Output: 16-bit mono PCM, 44.1 kHz.
  • Voice cloning: via a per-request speaker embedding.

Provenance

Converted from the upstream Zyphra/ZONOS2 checkpoint with mistralrs-modeltool zonos2-gguf.

Downloads last month
165
GGUF
Model size
8B params
Architecture
zonos2
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cagyirey/ZONOS2-GGUF

Base model

Zyphra/ZONOS2
Quantized
(2)
this model