Instructions to use InsecureErasure/Z-Image-Turbo-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use InsecureErasure/Z-Image-Turbo-NVFP4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("InsecureErasure/Z-Image-Turbo-NVFP4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Update README.md
Browse files
README.md
CHANGED
|
@@ -15,17 +15,34 @@ tags:
|
|
| 15 |
- txt2img
|
| 16 |
---
|
| 17 |
|
| 18 |
-
# Z-Image Turbo
|
| 19 |
|
| 20 |
Surgical mixed-precision quantization of [Z-Image Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) (6B S3-DiT), generated with [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant).
|
| 21 |
|
| 22 |
**Formats**: NVFP4 (baseline) + MXFP8 (sensitive layers) + BF16 (critical layers).
|
| 23 |
-
**Size**: 4.84 GB (
|
| 24 |
**Inference**: ComfyUI + [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200).
|
| 25 |
|
| 26 |
-
Also available: [MXFP8 uniform quantization](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) (6.23 GB, near-lossless
|
| 27 |
|
| 28 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Strategy
|
| 31 |
|
|
@@ -63,8 +80,6 @@ Uses per-layer sensitivity analysis via [`quant_probe`](https://github.com/insec
|
|
| 63 |
| `context_refiner` | All MXFP8 (qkv, w1, w2, w3) | qkv + w1 + w3 MXFP8, out + w2 BF16 |
|
| 64 |
| `noise_refiner` | qkv + out + w1 + w2 MXFP8, w3 BF16 | qkv + out + w2 + w3 BF16, w1 MXFP8 |
|
| 65 |
|
| 66 |
-
---
|
| 67 |
-
|
| 68 |
## Generation
|
| 69 |
|
| 70 |
```bash
|
|
@@ -93,15 +108,11 @@ convert_to_quant -i $1 \
|
|
| 93 |
|
| 94 |
Use the LoRA at **1.5–2.0** strength in ComfyUI for maximum fidelity.
|
| 95 |
|
| 96 |
-
---
|
| 97 |
-
|
| 98 |
## Requirements
|
| 99 |
|
| 100 |
- **Inference**: CUDA 13.0+, PyTorch 2.10+, [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200)
|
| 101 |
- **Generation**: `convert_to_quant >= 1.2.6`, `comfy-kitchen`
|
| 102 |
|
| 103 |
-
---
|
| 104 |
-
|
| 105 |
## Comparison
|
| 106 |
|
| 107 |
| | NVFP4 Mixed (this) | [MXFP8 Uniform](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) | [Official NVFP4](https://huggingface.co/Comfy-Org/z_image_turbo) |
|
|
@@ -119,8 +130,6 @@ Use the LoRA at **1.5–2.0** strength in ComfyUI for maximum fidelity.
|
|
| 119 |
|
| 120 |
¹ Estimated on RTX 5060 (Blackwell) with `comfy-kitchen` CUDA kernels.
|
| 121 |
|
| 122 |
-
---
|
| 123 |
-
|
| 124 |
## Methodology
|
| 125 |
|
| 126 |
Layer sensitivity was analyzed using [`quant_probe`](https://github.com/insecure-erasure/quant_probe), which computes per-tensor excess kurtosis, dynamic range, and aspect ratio, then scores them against the model's own distribution to recommend `*KEEP*`, `FP8`, or `NVFP4`.
|
|
@@ -133,8 +142,6 @@ Recommendations were cross-referenced against the DiT quantization literature:
|
|
| 133 |
- **SemanticDialect** (2026) — block-wise mixed-format validated for video DiTs
|
| 134 |
- **SVDQuant** (ICLR 2025) — low-rank branch absorbs 4-bit error, validated NVFP4
|
| 135 |
|
| 136 |
-
---
|
| 137 |
-
|
| 138 |
## Credits
|
| 139 |
|
| 140 |
- Quantization engine: [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant) by silveroxides
|
|
|
|
| 15 |
- txt2img
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# Z-Image Turbo - NVFP4 Mixed-Precision
|
| 19 |
|
| 20 |
Surgical mixed-precision quantization of [Z-Image Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) (6B S3-DiT), generated with [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant).
|
| 21 |
|
| 22 |
**Formats**: NVFP4 (baseline) + MXFP8 (sensitive layers) + BF16 (critical layers).
|
| 23 |
+
**Size**: 4.84 GB (-58% vs BF16).
|
| 24 |
**Inference**: ComfyUI + [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200).
|
| 25 |
|
| 26 |
+
Also available: [MXFP8 uniform quantization](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) (6.23 GB, near-lossless).
|
| 27 |
|
| 28 |
+

|
| 29 |
+

|
| 30 |
+
|
| 31 |
+
* **Prompt:**
|
| 32 |
+
```
|
| 33 |
+
A bust portrait of a woman in her mid-twenties with messy dark hair tied in a loose bun, wearing a worn denim jacket over a gray hoodie.
|
| 34 |
+
She is leaning her elbows on a washing machine, her chin resting on her folded hands. Behind her, a row of industrial dryers against a tiled wall,
|
| 35 |
+
with one dryer door hanging open. Above the dryers, a handwritten sign taped to the wall says 'OUT OF ORDER' in black marker,
|
| 36 |
+
with a small smiley face drawn on it. To her left, a plastic basket overflows with unfolded clothes. To her right, a vending machine glows green,
|
| 37 |
+
displaying 'SOAP $1.50' on a small digital screen. The light is cool and buzzing, like fluorescent tubes overhead. She looks tired but amused
|
| 38 |
+
with a faint smirk.
|
| 39 |
+
```
|
| 40 |
+
* **Sampler/Scheduler:** Euler/Simple
|
| 41 |
+
* **Steps:** 9
|
| 42 |
+
* **CFG:** 1.0
|
| 43 |
+
* **Shift:** 3.0
|
| 44 |
+
* **Seed:** 920698660737993
|
| 45 |
+
* **Resolution:** 1024 x 1536
|
| 46 |
|
| 47 |
## Strategy
|
| 48 |
|
|
|
|
| 80 |
| `context_refiner` | All MXFP8 (qkv, w1, w2, w3) | qkv + w1 + w3 MXFP8, out + w2 BF16 |
|
| 81 |
| `noise_refiner` | qkv + out + w1 + w2 MXFP8, w3 BF16 | qkv + out + w2 + w3 BF16, w1 MXFP8 |
|
| 82 |
|
|
|
|
|
|
|
| 83 |
## Generation
|
| 84 |
|
| 85 |
```bash
|
|
|
|
| 108 |
|
| 109 |
Use the LoRA at **1.5–2.0** strength in ComfyUI for maximum fidelity.
|
| 110 |
|
|
|
|
|
|
|
| 111 |
## Requirements
|
| 112 |
|
| 113 |
- **Inference**: CUDA 13.0+, PyTorch 2.10+, [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200)
|
| 114 |
- **Generation**: `convert_to_quant >= 1.2.6`, `comfy-kitchen`
|
| 115 |
|
|
|
|
|
|
|
| 116 |
## Comparison
|
| 117 |
|
| 118 |
| | NVFP4 Mixed (this) | [MXFP8 Uniform](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) | [Official NVFP4](https://huggingface.co/Comfy-Org/z_image_turbo) |
|
|
|
|
| 130 |
|
| 131 |
¹ Estimated on RTX 5060 (Blackwell) with `comfy-kitchen` CUDA kernels.
|
| 132 |
|
|
|
|
|
|
|
| 133 |
## Methodology
|
| 134 |
|
| 135 |
Layer sensitivity was analyzed using [`quant_probe`](https://github.com/insecure-erasure/quant_probe), which computes per-tensor excess kurtosis, dynamic range, and aspect ratio, then scores them against the model's own distribution to recommend `*KEEP*`, `FP8`, or `NVFP4`.
|
|
|
|
| 142 |
- **SemanticDialect** (2026) — block-wise mixed-format validated for video DiTs
|
| 143 |
- **SVDQuant** (ICLR 2025) — low-rank branch absorbs 4-bit error, validated NVFP4
|
| 144 |
|
|
|
|
|
|
|
| 145 |
## Credits
|
| 146 |
|
| 147 |
- Quantization engine: [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant) by silveroxides
|