InsecureErasure
/

Z-Image-Turbo-NVFP4

@@ -15,17 +15,34 @@ tags:
   - txt2img
 ---
-# Z-Image Turbo — NVFP4 Mixed-Precision
 Surgical mixed-precision quantization of [Z-Image Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) (6B S3-DiT), generated with [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant).
 **Formats**: NVFP4 (baseline) + MXFP8 (sensitive layers) + BF16 (critical layers).
-**Size**: 4.84 GB (−58% vs BF16).
 **Inference**: ComfyUI + [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200).
-Also available: [MXFP8 uniform quantization](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) (6.23 GB, near-lossless, simpler).
----
 ## Strategy
@@ -63,8 +80,6 @@ Uses per-layer sensitivity analysis via [`quant_probe`](https://github.com/insec
 | `context_refiner` | All MXFP8 (qkv, w1, w2, w3) | qkv + w1 + w3 MXFP8, out + w2 BF16 |
 | `noise_refiner` | qkv + out + w1 + w2 MXFP8, w3 BF16 | qkv + out + w2 + w3 BF16, w1 MXFP8 |
----
 ## Generation
 ```bash
@@ -93,15 +108,11 @@ convert_to_quant -i $1 \
 Use the LoRA at **1.5–2.0** strength in ComfyUI for maximum fidelity.
----
 ## Requirements
 - **Inference**: CUDA 13.0+, PyTorch 2.10+, [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200)
 - **Generation**: `convert_to_quant >= 1.2.6`, `comfy-kitchen`
----
 ## Comparison
 | | NVFP4 Mixed (this) | [MXFP8 Uniform](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) | [Official NVFP4](https://huggingface.co/Comfy-Org/z_image_turbo) |
@@ -119,8 +130,6 @@ Use the LoRA at **1.5–2.0** strength in ComfyUI for maximum fidelity.
 ¹ Estimated on RTX 5060 (Blackwell) with `comfy-kitchen` CUDA kernels.
----
 ## Methodology
 Layer sensitivity was analyzed using [`quant_probe`](https://github.com/insecure-erasure/quant_probe), which computes per-tensor excess kurtosis, dynamic range, and aspect ratio, then scores them against the model's own distribution to recommend `*KEEP*`, `FP8`, or `NVFP4`.
@@ -133,8 +142,6 @@ Recommendations were cross-referenced against the DiT quantization literature:
 - **SemanticDialect** (2026) — block-wise mixed-format validated for video DiTs
 - **SVDQuant** (ICLR 2025) — low-rank branch absorbs 4-bit error, validated NVFP4
----
 ## Credits
 - Quantization engine: [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant) by silveroxides

   - txt2img
 ---
+# Z-Image Turbo - NVFP4 Mixed-Precision
 Surgical mixed-precision quantization of [Z-Image Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) (6B S3-DiT), generated with [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant).
 **Formats**: NVFP4 (baseline) + MXFP8 (sensitive layers) + BF16 (critical layers).
+**Size**: 4.84 GB (-58% vs BF16).
 **Inference**: ComfyUI + [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200).
+Also available: [MXFP8 uniform quantization](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) (6.23 GB, near-lossless).
+![BF16 vs NFVP4](images/BF16-NVFP4-comp.png)
+![NVFP4 vs NFVP4 plus rank 32 LoRA](images/NVFP4-LoRA-comp.png)
+* **Prompt:**
+```
+A bust portrait of a woman in her mid-twenties with messy dark hair tied in a loose bun, wearing a worn denim jacket over a gray hoodie.
+She is leaning her elbows on a washing machine, her chin resting on her folded hands. Behind her, a row of industrial dryers against a tiled wall,
+with one dryer door hanging open. Above the dryers, a handwritten sign taped to the wall says 'OUT OF ORDER' in black marker,
+with a small smiley face drawn on it. To her left, a plastic basket overflows with unfolded clothes. To her right, a vending machine glows green,
+displaying 'SOAP $1.50' on a small digital screen. The light is cool and buzzing, like fluorescent tubes overhead. She looks tired but amused
+with a faint smirk.
+```
+* **Sampler/Scheduler:** Euler/Simple
+* **Steps:** 9
+* **CFG:** 1.0
+* **Shift:** 3.0
+* **Seed:** 920698660737993
+* **Resolution:** 1024 x 1536
 ## Strategy
 | `context_refiner` | All MXFP8 (qkv, w1, w2, w3) | qkv + w1 + w3 MXFP8, out + w2 BF16 |
 | `noise_refiner` | qkv + out + w1 + w2 MXFP8, w3 BF16 | qkv + out + w2 + w3 BF16, w1 MXFP8 |
 ## Generation
 ```bash
 Use the LoRA at **1.5–2.0** strength in ComfyUI for maximum fidelity.
 ## Requirements
 - **Inference**: CUDA 13.0+, PyTorch 2.10+, [`comfy-kitchen`](https://github.com/Comfy-Org/comfy-kitchen), Blackwell GPU (RTX 50xx / B100 / B200)
 - **Generation**: `convert_to_quant >= 1.2.6`, `comfy-kitchen`
 ## Comparison
 | | NVFP4 Mixed (this) | [MXFP8 Uniform](https://huggingface.co/InsecureErasure/Z-Image-Turbo-MXFP8) | [Official NVFP4](https://huggingface.co/Comfy-Org/z_image_turbo) |
 ¹ Estimated on RTX 5060 (Blackwell) with `comfy-kitchen` CUDA kernels.
 ## Methodology
 Layer sensitivity was analyzed using [`quant_probe`](https://github.com/insecure-erasure/quant_probe), which computes per-tensor excess kurtosis, dynamic range, and aspect ratio, then scores them against the model's own distribution to recommend `*KEEP*`, `FP8`, or `NVFP4`.
 - **SemanticDialect** (2026) — block-wise mixed-format validated for video DiTs
 - **SVDQuant** (ICLR 2025) — low-rank branch absorbs 4-bit error, validated NVFP4
 ## Credits
 - Quantization engine: [`convert_to_quant`](https://github.com/silveroxides/convert_to_quant) by silveroxides