YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Z-Image Base NVFP4 Quantized Models

NVFP4 (4-bit NormalFloat) quantized versions of Tongyi-MAI/Z-Image Base model for ComfyUI.

These quantizations offer different trade-offs between quality and file size, allowing you to choose the best option for your hardware and quality requirements.

πŸ“Š Model Variants

Variant NVFP4 Layers What stays in BF16 Size Quality
Ultra 60 Attention + layers 0-4 & 25-29 ~8.0 GB ⭐⭐⭐⭐⭐
Quality 90 All attention (qkv, out) ~6.5 GB ⭐⭐⭐
Mixed 180 Refiners, embedders, final layer ~4.5 GB ⭐
Full 204 Only critical embedders ~3.5 GB ⭐

Original BF16 model size: 12.3 GB

🎯 Which variant should I use?

  • Ultra: Best quality, closest to original BF16. Use if you have enough VRAM and want maximum fidelity.
  • Quality: Excellent quality with significant size reduction. Recommended for most users.
  • Mixed: Poor quality not recommended.
  • Full: Poor quality not recommended.

πŸ”§ Technical Details

Quantization Strategy

The key insight is that attention layers (qkv, out) are much more sensitive to quantization than feed_forward layers (w1, w2, w3).

  • Ultra only quantizes feed_forward in middle layers (5-24), keeping first/last layers and all attention in BF16
  • Quality quantizes all feed_forward but keeps all attention in BF16
  • Mixed quantizes everything in the 30 main transformer layers
  • Full additionally quantizes context_refiner, noise_refiner, and t_embedder

NVFP4 Format Structure

Each quantized layer contains:

  • {layer}.weight: uint8 (2 FP4 values packed per byte)
  • {layer}.weight_scale: float8_e4m3fn, 2D (per-block scale, 16-element blocks)
  • {layer}.weight_scale_2: float32, scalar (per-tensor scale)
  • {layer}.input_scale: float32, scalar (activation scale)

πŸ’» Usage in ComfyUI

Requirements

⚠️ NVFP4 requires specific hardware and software!

  • GPU: NVIDIA Blackwell series (RTX 5080 / 5090) - NVFP4 is a Blackwell-exclusive feature
  • PyTorch: 2.9.0+ with CUDA 13.0 (cu130) - older versions do not support NVFP4
  • ComfyUI: Latest version (updated regularly)
  • comfy-kitchen: >= 0.2.7

Recommended Settings

Model: z-image-base-nvfp4_[variant].safetensors
Steps: 28-50
CFG Scale: 3.0-5.0

⚠️ Note: This is Z-Image Base, not Turbo. Use 28-50 steps with CFG guidance, not 8 steps like Turbo.

πŸ“ Model Architecture

Z-Image Base is a 6B parameter diffusion transformer based on the NextDiT architecture:

  • 30 main transformer layers
  • 2 context refiner layers
  • 2 noise refiner layers
  • Hidden dimension: 3840
  • Attention heads: 30
  • Supports CFG (Classifier-Free Guidance)

πŸ™ Credits

  • Original model: Tongyi-MAI/Z-Image by Alibaba
  • Quantization format: ComfyUI NVFP4 implementation
  • Conversion script: Custom Python script using ComfyUI's TensorCoreNVFP4Layout

πŸ“„ License

Please refer to the original model's license at Tongyi-MAI/Z-Image.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support