Instructions to use InsecureErasure/Z-Image-Turbo-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use InsecureErasure/Z-Image-Turbo-NVFP4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("InsecureErasure/Z-Image-Turbo-NVFP4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Update README.md
Browse files
README.md
CHANGED
|
@@ -78,7 +78,7 @@ convert_to_quant -i $1 \
|
|
| 78 |
--custom-type mxfp8 \
|
| 79 |
--custom-layers "layers\.(10|16|26)\.attention\.qkv\.weight|layers\.(27|28)\.attention\.qkv\.weight|layers\.(0|1)\.attention\.out\.weight|layers\.(3|6|9|11|12|13|14|19|20|26)\.attention\.out\.weight|layers\.(27|28|29)\.attention\.out\.weight|layers\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26)\.feed_forward\.w1\.weight|layers\.(27|28|29)\.feed_forward\.w1\.weight|layers\.(16|17|18|19|20|21)\.adaLN_modulation\.0\.weight|context_refiner\.(0|1)\.attention\.qkv\.weight|context_refiner\.(0|1)\.feed_forward\.w1\.weight|context_refiner\.(0|1)\.feed_forward\.w2\.weight|context_refiner\.(0|1)\.feed_forward\.w3\.weight|noise_refiner\.(0)\.attention\.(qkv|out)\.weight|noise_refiner\.(0)\.feed_forward\.(w1|w2)\.weight|noise_refiner\.(1)\.feed_forward\.w1\.weight" \
|
| 80 |
--exclude-layers "layers\.(29)\.attention\.qkv\.weight|layers\.(22|23|24|25|26)\.adaLN_modulation\.0\.weight|layers\.(27|28|29)\.adaLN_modulation\.0\.weight|context_refiner\.(0|1)\.attention\.out\.weight|context_refiner\.(1)\.feed_forward\.w2\.weight|noise_refiner\.(1)\.attention\.qkv\.weight|noise_refiner\.(1)\.attention\.out\.weight|noise_refiner\.(1)\.feed_forward\.w2\.weight|noise_refiner\.(0|1)\.feed_forward\.w3\.weight" \
|
| 81 |
-
--num-iter 6000 --top-p 0.35 --calib-samples 8192
|
| 82 |
--scale-optimization iterative --scale-refinement-rounds 2 \
|
| 83 |
--extract-lora --lora-rank 32 \
|
| 84 |
-o "${1%%.safetensors}-nvfp4.safetensors"
|
|
@@ -97,7 +97,7 @@ Use the LoRA at **1.5–2.0** strength in ComfyUI for maximum fidelity.
|
|
| 97 |
|
| 98 |
## Requirements
|
| 99 |
|
| 100 |
-
- **Inference**: CUDA 13.0+, PyTorch 2.
|
| 101 |
- **Generation**: `convert_to_quant >= 1.2.6`, `comfy-kitchen`
|
| 102 |
|
| 103 |
---
|
|
@@ -137,7 +137,7 @@ Recommendations were cross-referenced against the DiT quantization literature:
|
|
| 137 |
|
| 138 |
## Credits
|
| 139 |
|
| 140 |
-
- Layer sensitivity analysis via [`quant_probe`](https://github.com/insecure-erasure/quant_probe)
|
| 141 |
- Quantization engine: [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant) by silveroxides
|
| 142 |
- Z-Image Turbo model by [Tongyi-MAI](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
|
| 143 |
-
- ComfyUI integration via [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen)
|
|
|
|
|
|
| 78 |
--custom-type mxfp8 \
|
| 79 |
--custom-layers "layers\.(10|16|26)\.attention\.qkv\.weight|layers\.(27|28)\.attention\.qkv\.weight|layers\.(0|1)\.attention\.out\.weight|layers\.(3|6|9|11|12|13|14|19|20|26)\.attention\.out\.weight|layers\.(27|28|29)\.attention\.out\.weight|layers\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26)\.feed_forward\.w1\.weight|layers\.(27|28|29)\.feed_forward\.w1\.weight|layers\.(16|17|18|19|20|21)\.adaLN_modulation\.0\.weight|context_refiner\.(0|1)\.attention\.qkv\.weight|context_refiner\.(0|1)\.feed_forward\.w1\.weight|context_refiner\.(0|1)\.feed_forward\.w2\.weight|context_refiner\.(0|1)\.feed_forward\.w3\.weight|noise_refiner\.(0)\.attention\.(qkv|out)\.weight|noise_refiner\.(0)\.feed_forward\.(w1|w2)\.weight|noise_refiner\.(1)\.feed_forward\.w1\.weight" \
|
| 80 |
--exclude-layers "layers\.(29)\.attention\.qkv\.weight|layers\.(22|23|24|25|26)\.adaLN_modulation\.0\.weight|layers\.(27|28|29)\.adaLN_modulation\.0\.weight|context_refiner\.(0|1)\.attention\.out\.weight|context_refiner\.(1)\.feed_forward\.w2\.weight|noise_refiner\.(1)\.attention\.qkv\.weight|noise_refiner\.(1)\.attention\.out\.weight|noise_refiner\.(1)\.feed_forward\.w2\.weight|noise_refiner\.(0|1)\.feed_forward\.w3\.weight" \
|
| 81 |
+
--num-iter 6000 --top-p 0.35 --calib-samples 8192 \
|
| 82 |
--scale-optimization iterative --scale-refinement-rounds 2 \
|
| 83 |
--extract-lora --lora-rank 32 \
|
| 84 |
-o "${1%%.safetensors}-nvfp4.safetensors"
|
|
|
|
| 97 |
|
| 98 |
## Requirements
|
| 99 |
|
| 100 |
+
- **Inference**: CUDA 13.0+, PyTorch 2.10+, [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200)
|
| 101 |
- **Generation**: `convert_to_quant >= 1.2.6`, `comfy-kitchen`
|
| 102 |
|
| 103 |
---
|
|
|
|
| 137 |
|
| 138 |
## Credits
|
| 139 |
|
|
|
|
| 140 |
- Quantization engine: [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant) by silveroxides
|
| 141 |
- Z-Image Turbo model by [Tongyi-MAI](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
|
| 142 |
+
- ComfyUI integration via [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen)
|
| 143 |
+
- Layer sensitivity analysis via [`quant_probe`](https://github.com/insecure-erasure/quant_probe)
|