Z-Image Base NVFP4 Quantized Models
NVFP4 (4-bit NormalFloat) quantized versions of Tongyi-MAI/Z-Image Base model for ComfyUI.
These quantizations offer different trade-offs between quality and file size, allowing you to choose the best option for your hardware and quality requirements.
π Model Variants
| Variant | NVFP4 Layers | What stays in BF16 | Size | Quality |
|---|---|---|---|---|
| Ultra | 60 | Attention + layers 0-4 & 25-29 | ~8.0 GB | βββββ |
| Quality | 90 | All attention (qkv, out) | ~6.5 GB | βββ |
| Mixed | 180 | Refiners, embedders, final layer | ~4.5 GB | β |
| Full | 204 | Only critical embedders | ~3.5 GB | β |
Original BF16 model size: 12.3 GB
π― Which variant should I use?
- Ultra: Best quality, closest to original BF16. Use if you have enough VRAM and want maximum fidelity.
- Quality: Excellent quality with significant size reduction. Recommended for most users.
- Mixed: Poor quality not recommended.
- Full: Poor quality not recommended.
π§ Technical Details
Quantization Strategy
The key insight is that attention layers (qkv, out) are much more sensitive to quantization than feed_forward layers (w1, w2, w3).
- Ultra only quantizes feed_forward in middle layers (5-24), keeping first/last layers and all attention in BF16
- Quality quantizes all feed_forward but keeps all attention in BF16
- Mixed quantizes everything in the 30 main transformer layers
- Full additionally quantizes context_refiner, noise_refiner, and t_embedder
NVFP4 Format Structure
Each quantized layer contains:
{layer}.weight: uint8 (2 FP4 values packed per byte){layer}.weight_scale: float8_e4m3fn, 2D (per-block scale, 16-element blocks){layer}.weight_scale_2: float32, scalar (per-tensor scale){layer}.input_scale: float32, scalar (activation scale)
π» Usage in ComfyUI
Requirements
β οΈ NVFP4 requires specific hardware and software!
- GPU: NVIDIA Blackwell series (RTX 5080 / 5090) - NVFP4 is a Blackwell-exclusive feature
- PyTorch: 2.9.0+ with CUDA 13.0 (
cu130) - older versions do not support NVFP4 - ComfyUI: Latest version (updated regularly)
- comfy-kitchen: >= 0.2.7
Recommended Settings
Model: z-image-base-nvfp4_[variant].safetensors
Steps: 28-50
CFG Scale: 3.0-5.0
β οΈ Note: This is Z-Image Base, not Turbo. Use 28-50 steps with CFG guidance, not 8 steps like Turbo.
π Model Architecture
Z-Image Base is a 6B parameter diffusion transformer based on the NextDiT architecture:
- 30 main transformer layers
- 2 context refiner layers
- 2 noise refiner layers
- Hidden dimension: 3840
- Attention heads: 30
- Supports CFG (Classifier-Free Guidance)
π Credits
- Original model: Tongyi-MAI/Z-Image by Alibaba
- Quantization format: ComfyUI NVFP4 implementation
- Conversion script: Custom Python script using ComfyUI's TensorCoreNVFP4Layout
π License
Please refer to the original model's license at Tongyi-MAI/Z-Image.