Hunyuan Image 3.0 Instruct -- INT8 Quantized (v2)

INT8 quantization of the HunyuanImage-3.0 Instruct model (v2). Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement (recaption/think_recaption).

What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

Key Features

  • Instruct model -- supports text-to-image, image editing, multi-image fusion
  • Chain-of-Thought -- built-in think_recaption mode for highest quality
  • INT8 quantized -- ~83 GB on disk
  • 50 diffusion steps (full quality)
  • Block swap support -- offload transformer blocks to CPU for lower VRAM
  • ComfyUI ready -- works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component Memory
Weight Loading ~80 GB weights
Inference (additional) ~12-20 GB inference
Total ~92-100 GB

Recommended Hardware:

  • NVIDIA RTX 6000 Blackwell (96GB) -- fits entirely with headroom
  • With block swap (4-8 blocks): fits on 64-80GB GPUs
  • NVIDIA RTX 6000 Ada (48GB) -- requires significant block swap

Model Details

  • Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
  • Parameters: 80B total, 13B active per token (top-K MoE routing)
  • Variant: Instruct (Full)
  • Quantization: INT8 per-channel quantization via bitsandbytes
  • Diffusion Steps: 50
  • Default Guidance Scale: 2.5
  • Resolution: Up to 2048x2048
  • Language: English and Chinese prompts

Quantization Details

Layers quantized to INT8:

  • Feed-forward networks (FFN/MLP layers)
  • Expert layers in MoE architecture (64 experts per layer)
  • Large linear transformations

Kept in full precision (BF16):

  • VAE encoder/decoder (critical for image quality)
  • Attention projection layers (q_proj, k_proj, v_proj, o_proj)
  • Patch embedding layers
  • Time embedding layers
  • Vision model (SigLIP2)
  • Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
  1. Download this model to your preferred models directory
  2. Use the "Hunyuan 3 Instruct Loader" node
  3. Select this model folder and choose int8 precision
  4. Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
  5. Or use "Hunyuan 3 Instruct Edit" for image editing
  6. Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

Bot Task Modes

The Instruct model supports three generation modes:

Mode Description Speed
image Direct text-to-image, prompt used as-is Fastest
recaption Model rewrites prompt into detailed description, then generates Medium
think_recaption CoT reasoning -> prompt enhancement -> generation (best quality) Slowest

Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.

blocks_to_swap VRAM Saved Recommended For
0 0 GB 96GB+ GPU (no swap needed)
4 ~10 GB 80-90GB GPU
8 ~20 GB 64-80GB GPU
16 ~40 GB 48-64GB GPU
-1 (auto) varies Let the system decide

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0 Instruct.

Credits

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

Downloads last month
9
Safetensors
Model size
83B params
Tensor type
BF16
F32
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for EricRollei/HunyuanImage-3.0-Instruct-INT8-v2

Quantized
(5)
this model