Hunyuan Image 3.0 Base -- NF4 Quantized (v2)

NF4 (4-bit) quantization of the HunyuanImage-3.0 base model (v2). Fits on a single 48GB GPU. High-quality text-to-image generation with the Hunyuan 3.0 MoE architecture. CFG-distilled for single-pass inference.

What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

Key Features

  • Text-to-image generation with the Hunyuan 3.0 MoE architecture
  • NF4 quantized -- ~47 GB on disk
  • 45 diffusion steps (CFG-distilled, single-pass)
  • Block swap support -- offload transformer blocks to CPU for lower VRAM
  • ComfyUI ready -- works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component Memory
Weight Loading ~29 GB weights
Inference (additional) ~10-15 GB inference
Total ~39-44 GB

Recommended Hardware:

  • Single 48GB GPU (RTX 6000 Ada, RTX PRO 5000, A6000)
  • With block swap: may work on 24GB GPUs (swapping ~20 blocks)

Model Details

  • Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
  • Parameters: 80B total, 13B active per token (top-K MoE routing)
  • Variant: Base (text-to-image)
  • Quantization: 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization
  • Diffusion Steps: 45
  • Default Guidance Scale: 7.0
  • Resolution: Up to 2048x2048
  • Language: English and Chinese prompts

Quantization Details

Layers quantized to NF4:

  • Feed-forward networks (FFN/MLP layers)
  • Expert layers in MoE architecture (64 experts per layer)
  • Large linear transformations

Kept in full precision (BF16):

  • VAE encoder/decoder (critical for image quality)
  • Attention projection layers (q_proj, k_proj, v_proj, o_proj)
  • Patch embedding layers
  • Time embedding layers
  • Vision model (SigLIP2)
  • Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
  1. Download this model to your preferred models directory
  2. Use the "Hunyuan 3 V2 Unified" node
  3. Point the model path to this folder and select nf4 precision
  4. Set blocks_to_swap to -1 (auto) or a manual value based on your VRAM

Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.

blocks_to_swap VRAM Saved Recommended For
0 0 GB 96GB+ GPU (no swap needed)
4 ~5 GB 80-90GB GPU
8 ~10 GB 64-80GB GPU
16 ~19 GB 48-64GB GPU
-1 (auto) varies Let the system decide

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0.

Credits

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

Downloads last month
33
Safetensors
Model size
83B params
Tensor type
F32
BF16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for EricRollei/HunyuanImage-3-NF4-v2

Quantized
(7)
this model