Hunyuan Image 3.0 Instruct Distil -- INT8 Quantized (v2)

INT8 quantization of the HunyuanImage-3.0 Instruct Distil model (v2). CFG-distilled for ~6x faster generation (8 steps vs 50). Same quality as the full Instruct model with dramatically faster inference.

What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

Key Features

  • Instruct model -- supports text-to-image, image editing, multi-image fusion
  • Chain-of-Thought -- built-in think_recaption mode for highest quality
  • INT8 quantized -- ~83 GB on disk
  • 8 diffusion steps (CFG-distilled)
  • Block swap support -- offload transformer blocks to CPU for lower VRAM
  • ComfyUI ready -- works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component Memory
Weight Loading ~80 GB weights
Inference (additional) ~12-20 GB inference
Total ~92-100 GB

Recommended Hardware:

  • NVIDIA RTX 6000 Blackwell (96GB) -- fits entirely with headroom
  • With block swap (4-8 blocks): fits on 64-80GB GPUs
  • NVIDIA RTX 6000 Ada (48GB) -- requires significant block swap

Model Details

  • Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
  • Parameters: 80B total, 13B active per token (top-K MoE routing)
  • Variant: Instruct Distil (CFG-Distilled, 8-step)
  • Quantization: INT8 per-channel quantization via bitsandbytes
  • Diffusion Steps: 8
  • Default Guidance Scale: 2.5
  • Resolution: Up to 2048x2048
  • Language: English and Chinese prompts

Distillation

This is the CFG-Distilled variant:

  • Only 8 diffusion steps needed (vs 50 for the full Instruct model)
  • ~6x faster image generation
  • No quality loss -- distilled to match the full model's output
  • cfg_distilled: true means no classifier-free guidance needed

Quantization Details

Layers quantized to INT8:

  • Feed-forward networks (FFN/MLP layers)
  • Expert layers in MoE architecture (64 experts per layer)
  • Large linear transformations

Kept in full precision (BF16):

  • VAE encoder/decoder (critical for image quality)
  • Attention projection layers (q_proj, k_proj, v_proj, o_proj)
  • Patch embedding layers
  • Time embedding layers
  • Vision model (SigLIP2)
  • Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
  1. Download this model to your preferred models directory
  2. Use the "Hunyuan 3 Instruct Loader" node
  3. Select this model folder and choose int8 precision
  4. Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
  5. Or use "Hunyuan 3 Instruct Edit" for image editing
  6. Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

Bot Task Modes

The Instruct model supports three generation modes:

Mode Description Speed
image Direct text-to-image, prompt used as-is Fastest
recaption Model rewrites prompt into detailed description, then generates Medium
think_recaption CoT reasoning -> prompt enhancement -> generation (best quality) Slowest

Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.

blocks_to_swap VRAM Saved Recommended For
0 0 GB 96GB+ GPU (no swap needed)
4 ~10 GB 80-90GB GPU
8 ~20 GB 64-80GB GPU
16 ~40 GB 48-64GB GPU
-1 (auto) varies Let the system decide

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0 Instruct.

Credits

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

Downloads last month
12
Safetensors
Model size
83B params
Tensor type
BF16
F32
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2

Quantized
(5)
this model