Instructions to use InsecureErasure/Z-Image-Turbo-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use InsecureErasure/Z-Image-Turbo-NVFP4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("InsecureErasure/Z-Image-Turbo-NVFP4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,143 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
base_model:
|
| 7 |
+
- Tongyi-MAI/Z-Image-Turbo
|
| 8 |
+
base_model_relation: quantized
|
| 9 |
+
pipeline_tag: text-to-image
|
| 10 |
+
library_name: diffusers
|
| 11 |
+
tags:
|
| 12 |
+
- comfyui
|
| 13 |
+
- quantization
|
| 14 |
+
- nvfp4
|
| 15 |
+
- txt2img
|
| 16 |
---
|
| 17 |
+
|
| 18 |
+
# Z-Image Turbo β NVFP4 Mixed-Precision
|
| 19 |
+
|
| 20 |
+
Surgical mixed-precision quantization of [Z-Image Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) (6B S3-DiT), generated with [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant).
|
| 21 |
+
|
| 22 |
+
**Formats**: NVFP4 (baseline) + MXFP8 (sensitive layers) + BF16 (critical layers).
|
| 23 |
+
**Size**: 4.84 GB (β58% vs BF16).
|
| 24 |
+
**Inference**: ComfyUI + [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200).
|
| 25 |
+
|
| 26 |
+
Also available: [MXFP8 uniform quantization](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) (6.23 GB, near-lossless, simpler).
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Strategy
|
| 31 |
+
|
| 32 |
+
Uses per-layer sensitivity analysis via [`quant_probe`](https://github.com/insecure-erasure/quant_probe) and the DiT quantization literature (PTQ4DiT, ViDiT-Q, SemanticDialect, SVDQuant) to maximize quality-per-byte:
|
| 33 |
+
|
| 34 |
+
- **~190 tensors β NVFP4** (4-bit E2M1): baseline for most attention + FF weights
|
| 35 |
+
- **~100 tensors β MXFP8** (8-bit E4M3 + E8M0): attention outputs, gate projections (w1), mid-block adaLN
|
| 36 |
+
- **~20 tensors β BF16**: last QKV, late adaLN modulations, refiner outputs
|
| 37 |
+
- **~110 tensors β BF16**: norms, biases, embeddings (auto-excluded by `--zimage`)
|
| 38 |
+
|
| 39 |
+
### MXFP8-protected layers
|
| 40 |
+
|
| 41 |
+
| Category | Blocks | Layers |
|
| 42 |
+
|---|---|---|
|
| 43 |
+
| Early attention outputs | 0, 1 | `attention.out` |
|
| 44 |
+
| Selected QKV projections | 10, 16, 26, 27, 28 | `attention.qkv` |
|
| 45 |
+
| Attention outputs | 3, 6, 9, 11β14, 19, 20, 26β29 | `attention.out` |
|
| 46 |
+
| Gate projections (w1) | 3β29 | `feed_forward.w1` |
|
| 47 |
+
| Mid-block modulations | 16β21 | `adaLN_modulation.0` |
|
| 48 |
+
|
| 49 |
+
### BF16-protected layers
|
| 50 |
+
|
| 51 |
+
| Category | Layers | Reason |
|
| 52 |
+
|---|---|---|
|
| 53 |
+
| Last QKV | `layers.29.attention.qkv` | Feeds directly into `final_layer` β no downstream compensation |
|
| 54 |
+
| Late modulations | `layers.(22β29).adaLN_modulation.0` | Controls scale/shift of features near output |
|
| 55 |
+
| Refiner attention outputs | `context_refiner.(0\|1).attention.out` | Only 2 refiner blocks β outputs have outsized impact |
|
| 56 |
+
| Selected refiner FF | `context_refiner.1.w2`, `noise_refiner.1.{qkv,out,w2}` | Critical single-block projections |
|
| 57 |
+
| Refiner up-projections | `noise_refiner.(0\|1).w3` | Noise refiner w3 expands features β direct output |
|
| 58 |
+
|
| 59 |
+
### Refiner sub-graphs
|
| 60 |
+
|
| 61 |
+
| Sub-graph | Block 0 | Block 1 |
|
| 62 |
+
|---|---|---|
|
| 63 |
+
| `context_refiner` | All MXFP8 (qkv, w1, w2, w3) | qkv + w1 + w3 MXFP8, out + w2 BF16 |
|
| 64 |
+
| `noise_refiner` | qkv + out + w1 + w2 MXFP8, w3 BF16 | qkv + out + w2 + w3 BF16, w1 MXFP8 |
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## Generation
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
#!/bin/bash
|
| 72 |
+
# NVFP4 baseline + MXFP8 for sensitive layers + BF16 at critical points.
|
| 73 |
+
# Refiners: block 0 fully MXFP8, block 1 outputs kept in BF16.
|
| 74 |
+
# Last QKV (layer 29), late adaLN (22-29), and refiner outputs in BF16.
|
| 75 |
+
# All main-trunk w1 (gate) projections in MXFP8.
|
| 76 |
+
convert_to_quant -i $1 \
|
| 77 |
+
--nvfp4 --zimage --comfy_quant --save-quant-metadata \
|
| 78 |
+
--custom-type mxfp8 \
|
| 79 |
+
--custom-layers "layers\.(10|16|26)\.attention\.qkv\.weight|layers\.(27|28)\.attention\.qkv\.weight|layers\.(0|1)\.attention\.out\.weight|layers\.(3|6|9|11|12|13|14|19|20|26)\.attention\.out\.weight|layers\.(27|28|29)\.attention\.out\.weight|layers\.(3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26)\.feed_forward\.w1\.weight|layers\.(27|28|29)\.feed_forward\.w1\.weight|layers\.(16|17|18|19|20|21)\.adaLN_modulation\.0\.weight|context_refiner\.(0|1)\.attention\.qkv\.weight|context_refiner\.(0|1)\.feed_forward\.w1\.weight|context_refiner\.(0|1)\.feed_forward\.w2\.weight|context_refiner\.(0|1)\.feed_forward\.w3\.weight|noise_refiner\.(0)\.attention\.(qkv|out)\.weight|noise_refiner\.(0)\.feed_forward\.(w1|w2)\.weight|noise_refiner\.(1)\.feed_forward\.w1\.weight" \
|
| 80 |
+
--exclude-layers "layers\.(29)\.attention\.qkv\.weight|layers\.(22|23|24|25|26)\.adaLN_modulation\.0\.weight|layers\.(27|28|29)\.adaLN_modulation\.0\.weight|context_refiner\.(0|1)\.attention\.out\.weight|context_refiner\.(1)\.feed_forward\.w2\.weight|noise_refiner\.(1)\.attention\.qkv\.weight|noise_refiner\.(1)\.attention\.out\.weight|noise_refiner\.(1)\.feed_forward\.w2\.weight|noise_refiner\.(0|1)\.feed_forward\.w3\.weight" \
|
| 81 |
+
--num-iter 6000 --top-p 0.35 --calib-samples 8192 --manual-seed 42 \
|
| 82 |
+
--scale-optimization iterative --scale-refinement-rounds 2 \
|
| 83 |
+
--extract-lora --lora-rank 32 \
|
| 84 |
+
-o "${1%%.safetensors}-nvfp4.safetensors"
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
### Included files
|
| 88 |
+
|
| 89 |
+
| File | Description |
|
| 90 |
+
|---|---|
|
| 91 |
+
| `z_image_turbo_nvfp4_mixed.safetensors` | Quantized weights |
|
| 92 |
+
| `z_image_turbo_nvfp4_mixed_lora.safetensors` | Error-correction LoRA (rank 32) |
|
| 93 |
+
|
| 94 |
+
Use the LoRA at **1.5β2.0** strength in ComfyUI for maximum fidelity.
|
| 95 |
+
|
| 96 |
+
---
|
| 97 |
+
|
| 98 |
+
## Requirements
|
| 99 |
+
|
| 100 |
+
- **Inference**: CUDA 13.0+, PyTorch 2.8+, [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200)
|
| 101 |
+
- **Generation**: `convert_to_quant >= 1.2.6`, `comfy-kitchen`
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
## Comparison
|
| 106 |
+
|
| 107 |
+
| | NVFP4 Mixed (this) | [MXFP8 Uniform](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) | [Official NVFP4](https://huggingface.co/Comfy-Org/z_image_turbo) |
|
| 108 |
+
|---|---|---|---|---:|
|
| 109 |
+
| **Size** | 4.84 GB | 6.23 GB | 4.51 GB |
|
| 110 |
+
| **Base format** | NVFP4 (4-bit) | MXFP8 (8-bit) | NVFP4 (4-bit) |
|
| 111 |
+
| **Custom layers** | ~100 tensors β MXFP8 | None | None |
|
| 112 |
+
| **BF16 exclusions** | ~20 surgical | 8 patterns | Refiners fully BF16 |
|
| 113 |
+
| **Learned rounding** | β
6000 iter | β `--simple` | β |
|
| 114 |
+
| **LoRA** | β
rank 32 | β | β |
|
| 115 |
+
| **Refiner block 0** | MXFP8 | MXFP8 | BF16 |
|
| 116 |
+
| **Late adaLN (22β29)** | BF16 | BF16 | NVFP4 β οΈ |
|
| 117 |
+
| **Last QKV (layer 29)** | BF16 | BF16 | NVFP4 β οΈ |
|
| 118 |
+
| **Quantization timeΒΉ** | ~60β90 min | ~5β10 min | N/A |
|
| 119 |
+
|
| 120 |
+
ΒΉ Estimated on RTX 5060 (Blackwell) with `comfy-kitchen` CUDA kernels.
|
| 121 |
+
|
| 122 |
+
---
|
| 123 |
+
|
| 124 |
+
## Methodology
|
| 125 |
+
|
| 126 |
+
Layer sensitivity was analyzed using [`quant_probe`](https://github.com/insecure-erasure/quant_probe), which computes per-tensor excess kurtosis, dynamic range, and aspect ratio, then scores them against the model's own distribution to recommend `*KEEP*`, `FP8`, or `NVFP4`.
|
| 127 |
+
|
| 128 |
+
Recommendations were cross-referenced against the DiT quantization literature:
|
| 129 |
+
|
| 130 |
+
- **PTQ4DiT** (NeurIPS 2024) β salient channels in QKV + FFN, last blocks most affected
|
| 131 |
+
- **ViDiT-Q** (ICLR 2025) β metric-decoupled sensitivity: self-attention dominates visual quality
|
| 132 |
+
- **HTG** (2025) β channel-dependent outliers, severe in later blocks
|
| 133 |
+
- **SemanticDialect** (2026) β block-wise mixed-format validated for video DiTs
|
| 134 |
+
- **SVDQuant** (ICLR 2025) β low-rank branch absorbs 4-bit error, validated NVFP4
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
## Credits
|
| 139 |
+
|
| 140 |
+
- Layer sensitivity analysis via [`quant_probe`](https://github.com/insecure-erasure/quant_probe)
|
| 141 |
+
- Quantization engine: [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant) by silveroxides
|
| 142 |
+
- Z-Image Turbo model by [Tongyi-MAI](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo)
|
| 143 |
+
- ComfyUI integration via [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen)
|