Z-Image Base NVFP4 Quantized Models

NVFP4 (4-bit NormalFloat) quantized versions of Tongyi-MAI/Z-Image Base model for ComfyUI.

These quantizations offer different trade-offs between quality and file size, allowing you to choose the best option for your hardware and quality requirements.

📊 Model Variants

Variant	NVFP4 Layers	What stays in BF16	Size	Quality
Ultra	60	Attention + layers 0-4 & 25-29	~8.0 GB	⭐⭐⭐⭐⭐
Quality	90	All attention (qkv, out)	~6.5 GB	⭐⭐⭐
Mixed	180	Refiners, embedders, final layer	~4.5 GB	⭐
Full	204	Only critical embedders	~3.5 GB	⭐

Original BF16 model size: 12.3 GB

🎯 Which variant should I use?

Ultra: Best quality, closest to original BF16. Use if you have enough VRAM and want maximum fidelity.
Quality: Excellent quality with significant size reduction. Recommended for most users.
Mixed: Poor quality not recommended.
Full: Poor quality not recommended.

🔧 Technical Details

Quantization Strategy

The key insight is that attention layers (qkv, out) are much more sensitive to quantization than feed_forward layers (w1, w2, w3).

Ultra only quantizes feed_forward in middle layers (5-24), keeping first/last layers and all attention in BF16
Quality quantizes all feed_forward but keeps all attention in BF16
Mixed quantizes everything in the 30 main transformer layers
Full additionally quantizes context_refiner, noise_refiner, and t_embedder

NVFP4 Format Structure

Each quantized layer contains:

{layer}.weight: uint8 (2 FP4 values packed per byte)
{layer}.weight_scale: float8_e4m3fn, 2D (per-block scale, 16-element blocks)
{layer}.weight_scale_2: float32, scalar (per-tensor scale)
{layer}.input_scale: float32, scalar (activation scale)

💻 Usage in ComfyUI

Requirements

⚠️ NVFP4 requires specific hardware and software!

GPU: NVIDIA Blackwell series (RTX 5080 / 5090) - NVFP4 is a Blackwell-exclusive feature
PyTorch: 2.9.0+ with CUDA 13.0 (cu130) - older versions do not support NVFP4
ComfyUI: Latest version (updated regularly)
comfy-kitchen: >= 0.2.7

Recommended Settings

Model: z-image-base-nvfp4_[variant].safetensors
Steps: 28-50
CFG Scale: 3.0-5.0

⚠️ Note: This is Z-Image Base, not Turbo. Use 28-50 steps with CFG guidance, not 8 steps like Turbo.

📝 Model Architecture

Z-Image Base is a 6B parameter diffusion transformer based on the NextDiT architecture:

30 main transformer layers
2 context refiner layers
2 noise refiner layers
Hidden dimension: 3840
Attention heads: 30
Supports CFG (Classifier-Free Guidance)

🙏 Credits

Original model: Tongyi-MAI/Z-Image by Alibaba
Quantization format: ComfyUI NVFP4 implementation
Conversion script: Custom Python script using ComfyUI's TensorCoreNVFP4Layout

📄 License

Please refer to the original model's license at Tongyi-MAI/Z-Image.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support