Hunyuan Image 3.0 Base -- NF4 Quantized (v2)
NF4 (4-bit) quantization of the HunyuanImage-3.0 base model (v2). Fits on a single 48GB GPU. High-quality text-to-image generation with the Hunyuan 3.0 MoE architecture. CFG-distilled for single-pass inference.
What's New in v2
v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.
Key Features
- Text-to-image generation with the Hunyuan 3.0 MoE architecture
- NF4 quantized -- ~47 GB on disk
- 45 diffusion steps (CFG-distilled, single-pass)
- Block swap support -- offload transformer blocks to CPU for lower VRAM
- ComfyUI ready -- works with Comfy_HunyuanImage3 nodes
VRAM Requirements
| Component | Memory |
|---|---|
| Weight Loading | ~29 GB weights |
| Inference (additional) | ~10-15 GB inference |
| Total | ~39-44 GB |
Recommended Hardware:
- Single 48GB GPU (RTX 6000 Ada, RTX PRO 5000, A6000)
- With block swap: may work on 24GB GPUs (swapping ~20 blocks)
Model Details
- Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
- Parameters: 80B total, 13B active per token (top-K MoE routing)
- Variant: Base (text-to-image)
- Quantization: 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization
- Diffusion Steps: 45
- Default Guidance Scale: 7.0
- Resolution: Up to 2048x2048
- Language: English and Chinese prompts
Quantization Details
Layers quantized to NF4:
- Feed-forward networks (FFN/MLP layers)
- Expert layers in MoE architecture (64 experts per layer)
- Large linear transformations
Kept in full precision (BF16):
- VAE encoder/decoder (critical for image quality)
- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
- Patch embedding layers
- Time embedding layers
- Vision model (SigLIP2)
- Final output layers
Usage
ComfyUI (Recommended)
This model is designed to work with the Comfy_HunyuanImage3 custom nodes:
cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
- Download this model to your preferred models directory
- Use the "Hunyuan 3 V2 Unified" node
- Point the model path to this folder and select
nf4precision - Set
blocks_to_swapto -1 (auto) or a manual value based on your VRAM
Block Swap
Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.
| blocks_to_swap | VRAM Saved | Recommended For |
|---|---|---|
| 0 | 0 GB | 96GB+ GPU (no swap needed) |
| 4 | ~5 GB | 80-90GB GPU |
| 8 | ~10 GB | 64-80GB GPU |
| 16 | ~19 GB | 48-64GB GPU |
| -1 (auto) | varies | Let the system decide |
Original Model
This is a quantized derivative of Tencent's HunyuanImage-3.0.
- License: Tencent Hunyuan Community License
Credits
- Original Model: Tencent Hunyuan Team
- Quantization: Eric Rollei
- ComfyUI Integration: Comfy_HunyuanImage3
License
This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License
- Downloads last month
- 33
Model tree for EricRollei/HunyuanImage-3-NF4-v2
Base model
tencent/HunyuanImage-3.0