Hunyuan Image 3.0 Instruct Distil -- NF4 Quantized (v2)

NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct Distil model (v2). The most accessible option -- fits on a single 48GB GPU with ~6x faster generation (8 steps vs 50). Best balance of speed, quality, and VRAM.

What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

Key Features

  • Instruct model -- supports text-to-image, image editing, multi-image fusion
  • Chain-of-Thought -- built-in think_recaption mode for highest quality
  • NF4 quantized -- ~48 GB on disk
  • 8 diffusion steps (CFG-distilled)
  • Block swap support -- offload transformer blocks to CPU for lower VRAM
  • ComfyUI ready -- works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component Memory
Weight Loading ~29 GB weights
Inference (additional) ~12-20 GB inference
Total ~41-49 GB

Recommended Hardware:

  • Single 48GB GPU (RTX 6000 Ada, RTX PRO 5000, A6000)
  • With block swap: may work on 24GB GPUs (swapping ~20 blocks)

Model Details

  • Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
  • Parameters: 80B total, 13B active per token (top-K MoE routing)
  • Variant: Instruct Distil (CFG-Distilled, 8-step)
  • Quantization: 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization
  • Diffusion Steps: 8
  • Default Guidance Scale: 2.5
  • Resolution: Up to 2048x2048
  • Language: English and Chinese prompts

Distillation

This is the CFG-Distilled variant:

  • Only 8 diffusion steps needed (vs 50 for the full Instruct model)
  • ~6x faster image generation
  • No quality loss -- distilled to match the full model's output
  • cfg_distilled: true means no classifier-free guidance needed

Quantization Details

Layers quantized to NF4:

  • Feed-forward networks (FFN/MLP layers)
  • Expert layers in MoE architecture (64 experts per layer)
  • Large linear transformations

Kept in full precision (BF16):

  • VAE encoder/decoder (critical for image quality)
  • Attention projection layers (q_proj, k_proj, v_proj, o_proj)
  • Patch embedding layers
  • Time embedding layers
  • Vision model (SigLIP2)
  • Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
  1. Download this model to your preferred models directory
  2. Use the "Hunyuan 3 Instruct Loader" node
  3. Select this model folder and choose nf4 precision
  4. Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
  5. Or use "Hunyuan 3 Instruct Edit" for image editing
  6. Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

Bot Task Modes

The Instruct model supports three generation modes:

Mode Description Speed
image Direct text-to-image, prompt used as-is Fastest
recaption Model rewrites prompt into detailed description, then generates Medium
think_recaption CoT reasoning -> prompt enhancement -> generation (best quality) Slowest

Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.

blocks_to_swap VRAM Saved Recommended For
0 0 GB 96GB+ GPU (no swap needed)
4 ~5 GB 80-90GB GPU
8 ~10 GB 64-80GB GPU
16 ~19 GB 48-64GB GPU
-1 (auto) varies Let the system decide

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0 Instruct.

Credits

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

Downloads last month
21
Safetensors
Model size
83B params
Tensor type
BF16
F32
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for EricRollei/HunyuanImage-3.0-Instruct-Distil-NF4-v2

Quantized
(5)
this model