| --- |
| license: apache-2.0 |
| pipeline_tag: text-to-speech |
| library_name: ZONOS2 |
| tags: |
| - zonos2 |
| - text-to-speech |
| - voice-clone |
| - clone |
| - voice |
| - tts |
| - comfyui |
| - fp8 |
| - mixed-fp8 |
| - bf16 |
| - e4m3 |
| - safetensors |
| --- |
| |
| # ZONOS2-FP8 |
|
|
| This repository provides a **mixed FP8 Safetensors conversion** of the original [Zyphra/ZONOS2](https://huggingface.co/Zyphra/ZONOS2) model for use with the [ZONOS2 TTS ComfyUI custom node](https://github.com/Saganaki22/Zonos2_TTS-ComfyUI). |
|
|
| The model was converted from the original PyTorch checkpoint format to `.safetensors` and quantized using a conservative mixed-precision policy. Only selected MoE expert projection weights were converted to FP8 E4M3, while the precision-sensitive parts of the model were kept in BF16 for stability and output quality. |
|
|
|  |
|
|
| ## Original Project |
|
|
| ZONOS2 is a text-to-speech model from Zyphra trained on more than 6 million hours of varied multilingual speech. It supports expressive speech generation and high-fidelity voice cloning. |
|
|
|  |
|
|
| ## ComfyUI Custom Node |
|
|
| This model package is intended for use with: |
|
|
| - https://github.com/Saganaki22/Zonos2_TTS-ComfyUI |
| |
| The ComfyUI node provides native ZONOS2 text-to-speech, audio-only voice cloning, mixed FP8 loading, BF16 compute support, SDPA/FlashAttention inference, progress reporting, and ComfyUI/AIMDO model-management integration. |
| |
| ## Model File |
| |
| Main model file: |
| |
| - `zonos2-fp8-mixed.safetensors` |
| |
| Direct download: |
| |
| - https://huggingface.co/drbaph/ZONOS-FP8/resolve/main/zonos2-fp8-mixed.safetensors?download=true |
| |
| ## Model Storage Location |
| |
| Place the model and required assets under: |
| |
| ComfyUI/ |
| βββ models/ |
| βββ zonos2/ |
| βββ zonos2-fp8-mixed.safetensors |
| βββ dac_44khz/ |
| βββ speaker_encoder/ |
| |
| Expected layout: |
|
|
| ComfyUI/models/zonos2/ |
| βββ zonos2-fp8-mixed.safetensors |
| βββ dac_44khz/ |
| β βββ config.json |
| β βββ model.safetensors |
| β βββ preprocessor_config.json |
| βββ speaker_encoder/ |
| βββ config.json |
| βββ model.safetensors |
| βββ preprocessor_config.json |
| |
| If `download_if_missing` is enabled in the ComfyUI node, missing assets can be downloaded automatically. |
|
|
| ## Usage |
|
|
| Install the ComfyUI custom node: |
|
|
| cd ComfyUI/custom_nodes |
| git clone https://github.com/Saganaki22/Zonos2_TTS-ComfyUI.git |
| |
| Then restart ComfyUI and load the **ZONOS2 FP8 Mixed** model from the node loader. |
|
|
| Recommended dtype settings for this checkpoint: |
|
|
| - `dtype: auto` |
| - `dtype: bf16` |
|
|
| The mixed FP8 checkpoint does not use the `fp16` runtime option. |
|
|
| ## Quantization Details |
|
|
| This checkpoint was quantized as a **mixed FP8/BF16 model**. |
|
|
| The quantization policy is deliberately conservative: |
|
|
| - **Converted to FP8 E4M3** |
| - MoE expert gate/up projection weights |
| - Specifically the expert `w13` projections |
|
|
| - **Left in BF16** |
| - Attention layers |
| - Dense feed-forward layers |
| - Expert-down projections, `w2` |
| - LM head |
| - Routers |
| - Token embeddings |
| - Speaker embeddings and speaker projections |
| - Normalization layers |
| - Biases |
| - Temperatures |
| - Other precision-sensitive paths |
|
|
| In short, the large MoE expert gate/up weights were quantized to FP8 E4M3, while the parts most likely to affect stability, routing, speaker identity, and generation quality were kept in BF16. |
|
|
| This reduces the main checkpoint from approximately **14.28 GiB** for the BF16 version to approximately **9.78 GiB** for the mixed FP8 version. |
|
|
| The mixed FP8 checkpoint is primarily a **memory-saving option**. It is not guaranteed to generate faster than BF16 on every GPU or ComfyUI setup. |
|
|
| ## Notes |
|
|
| - This repository is a mixed FP8 Safetensors package of the original ZONOS2 model. |
| - The model architecture and original weights come from Zyphra/ZONOS2. |
| - This package is provided for ComfyUI compatibility and convenience. |
| - Mixed FP8 support requires the current ZONOS2 TTS ComfyUI custom node. |
| - Voice cloning should only be used with voices you own or have explicit permission to use. |
|
|
| ## License |
|
|
| The original ZONOS2 model is released under the Apache License 2.0. |
|
|
| This converted mixed FP8 Safetensors package follows the same model license. |
|
|
| ## Responsible Use |
|
|
| Do not use this model for malicious impersonation, fraud, deception, harassment, non-consensual voice cloning, or any use intended to cause harm. |
|
|
| Only clone voices you own or have explicit permission to use. |
|
|
| ## Citation |
|
|
| If you find this model useful in an academic context, please cite the original ZONOS2 work: |
|
|
| @misc{zyphra2025zonos, |
| title = {Zonos V2 Technical Report}, |
| author = {Gabriel Clark, Sofian Mejjoute, Mohamed Osman, George Close, Beren Millidge}, |
| year = {2026}, |
| } |
| |
| ## Credits |
|
|
| - Original model: https://github.com/Zyphra/ZONOS2 |
| - Original Hugging Face repository: https://huggingface.co/Zyphra/ZONOS2 |
| - Mixed FP8 Safetensors package: https://huggingface.co/drbaph/ZONOS-FP8 |
| - BF16 Safetensors package: https://huggingface.co/drbaph/ZONOS2-BF16 |
| - ComfyUI custom node: https://github.com/Saganaki22/Zonos2_TTS-ComfyUI |