Update README.md

6b713d1 verified 19 days ago

5.36 kB

	---
	license: apache-2.0
	pipeline_tag: text-to-speech
	library_name: ZONOS2
	tags:
	- zonos2
	- text-to-speech
	- voice-clone
	- clone
	- voice
	- tts
	- comfyui
	- fp8
	- mixed-fp8
	- bf16
	- e4m3
	- safetensors
	---

	# ZONOS2-FP8

	This repository provides a mixed FP8 Safetensors conversion of the original [Zyphra/ZONOS2](https://huggingface.co/Zyphra/ZONOS2) model for use with the [ZONOS2 TTS ComfyUI custom node](https://github.com/Saganaki22/Zonos2_TTS-ComfyUI).

	The model was converted from the original PyTorch checkpoint format to `.safetensors` and quantized using a conservative mixed-precision policy. Only selected MoE expert projection weights were converted to FP8 E4M3, while the precision-sensitive parts of the model were kept in BF16 for stability and output quality.

	![Screenshot 2026-06-12 214924](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/aWr3cHXPHERGSyYG2hvMT.png)

	## Original Project

	ZONOS2 is a text-to-speech model from Zyphra trained on more than 6 million hours of varied multilingual speech. It supports expressive speech generation and high-fidelity voice cloning.

	![zonos2](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/Aiw3SXr27w4SVT4g-QSOt.gif)

	## ComfyUI Custom Node

	This model package is intended for use with:

	- https://github.com/Saganaki22/Zonos2_TTS-ComfyUI

	The ComfyUI node provides native ZONOS2 text-to-speech, audio-only voice cloning, mixed FP8 loading, BF16 compute support, SDPA/FlashAttention inference, progress reporting, and ComfyUI/AIMDO model-management integration.

	## Model File

	Main model file:

	- `zonos2-fp8-mixed.safetensors`

	Direct download:

	- https://huggingface.co/drbaph/ZONOS-FP8/resolve/main/zonos2-fp8-mixed.safetensors?download=true

	## Model Storage Location

	Place the model and required assets under:

	ComfyUI/
	└── models/
	└── zonos2/
	├── zonos2-fp8-mixed.safetensors
	├── dac_44khz/
	└── speaker_encoder/

	Expected layout:

	ComfyUI/models/zonos2/
	├── zonos2-fp8-mixed.safetensors
	├── dac_44khz/
	│ ├── config.json
	│ ├── model.safetensors
	│ └── preprocessor_config.json
	└── speaker_encoder/
	├── config.json
	├── model.safetensors
	└── preprocessor_config.json

	If `download_if_missing` is enabled in the ComfyUI node, missing assets can be downloaded automatically.

	## Usage

	Install the ComfyUI custom node:

	cd ComfyUI/custom_nodes
	git clone https://github.com/Saganaki22/Zonos2_TTS-ComfyUI.git

	Then restart ComfyUI and load the ZONOS2 FP8 Mixed model from the node loader.

	Recommended dtype settings for this checkpoint:

	- `dtype: auto`
	- `dtype: bf16`

	The mixed FP8 checkpoint does not use the `fp16` runtime option.

	## Quantization Details

	This checkpoint was quantized as a mixed FP8/BF16 model.

	The quantization policy is deliberately conservative:

	- Converted to FP8 E4M3
	- MoE expert gate/up projection weights
	- Specifically the expert `w13` projections

	- Left in BF16
	- Attention layers
	- Dense feed-forward layers
	- Expert-down projections, `w2`
	- LM head
	- Routers
	- Token embeddings
	- Speaker embeddings and speaker projections
	- Normalization layers
	- Biases
	- Temperatures
	- Other precision-sensitive paths

	In short, the large MoE expert gate/up weights were quantized to FP8 E4M3, while the parts most likely to affect stability, routing, speaker identity, and generation quality were kept in BF16.

	This reduces the main checkpoint from approximately 14.28 GiB for the BF16 version to approximately 9.78 GiB for the mixed FP8 version.

	The mixed FP8 checkpoint is primarily a memory-saving option. It is not guaranteed to generate faster than BF16 on every GPU or ComfyUI setup.

	## Notes

	- This repository is a mixed FP8 Safetensors package of the original ZONOS2 model.
	- The model architecture and original weights come from Zyphra/ZONOS2.
	- This package is provided for ComfyUI compatibility and convenience.
	- Mixed FP8 support requires the current ZONOS2 TTS ComfyUI custom node.
	- Voice cloning should only be used with voices you own or have explicit permission to use.

	## License

	The original ZONOS2 model is released under the Apache License 2.0.

	This converted mixed FP8 Safetensors package follows the same model license.

	## Responsible Use

	Do not use this model for malicious impersonation, fraud, deception, harassment, non-consensual voice cloning, or any use intended to cause harm.

	Only clone voices you own or have explicit permission to use.

	## Citation

	If you find this model useful in an academic context, please cite the original ZONOS2 work:

	@misc{zyphra2025zonos,
	title = {Zonos V2 Technical Report},
	author = {Gabriel Clark, Sofian Mejjoute, Mohamed Osman, George Close, Beren Millidge},
	year = {2026},
	}

	## Credits

	- Original model: https://github.com/Zyphra/ZONOS2
	- Original Hugging Face repository: https://huggingface.co/Zyphra/ZONOS2
	- Mixed FP8 Safetensors package: https://huggingface.co/drbaph/ZONOS-FP8
	- BF16 Safetensors package: https://huggingface.co/drbaph/ZONOS2-BF16
	- ComfyUI custom node: https://github.com/Saganaki22/Zonos2_TTS-ComfyUI