Hunyuan Image 3.0 Base -- INT8 Quantized (v2)

INT8 quantization of the HunyuanImage-3.0 base model (v2). High-quality text-to-image generation with the Hunyuan 3.0 Diffusion Transformer + Mixture-of-Experts architecture. CFG-distilled for single-pass inference.

What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

Key Features

  • Text-to-image generation with the Hunyuan 3.0 MoE architecture
  • INT8 quantized -- ~82 GB on disk
  • 45 diffusion steps (CFG-distilled, single-pass)
  • Block swap support -- offload transformer blocks to CPU for lower VRAM
  • ComfyUI ready -- works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component Memory
Weight Loading ~80 GB weights
Inference (additional) ~10-15 GB inference
Total ~90-95 GB

Recommended Hardware:

  • NVIDIA RTX 6000 Blackwell (96GB) -- fits entirely with headroom
  • With block swap (4-8 blocks): fits on 64-80GB GPUs
  • NVIDIA RTX 6000 Ada (48GB) -- requires significant block swap

Model Details

  • Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
  • Parameters: 80B total, 13B active per token (top-K MoE routing)
  • Variant: Base (text-to-image)
  • Quantization: INT8 per-channel quantization via bitsandbytes
  • Diffusion Steps: 45
  • Default Guidance Scale: 7.0
  • Resolution: Up to 2048x2048
  • Language: English and Chinese prompts

Quantization Details

Layers quantized to INT8:

  • Feed-forward networks (FFN/MLP layers)
  • Expert layers in MoE architecture (64 experts per layer)
  • Large linear transformations

Kept in full precision (BF16):

  • VAE encoder/decoder (critical for image quality)
  • Attention projection layers (q_proj, k_proj, v_proj, o_proj)
  • Patch embedding layers
  • Time embedding layers
  • Vision model (SigLIP2)
  • Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
  1. Download this model to your preferred models directory
  2. Use the "Hunyuan 3 V2 Unified" node
  3. Point the model path to this folder and select int8 precision
  4. Set blocks_to_swap to -1 (auto) or a manual value based on your VRAM

Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.

blocks_to_swap VRAM Saved Recommended For
0 0 GB 96GB+ GPU (no swap needed)
4 ~10 GB 80-90GB GPU
8 ~20 GB 64-80GB GPU
16 ~40 GB 48-64GB GPU
-1 (auto) varies Let the system decide

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0.

Credits

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

Downloads last month
33
Safetensors
Model size
83B params
Tensor type
F32
BF16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for EricRollei/HunyuanImage-3-INT8-v2

Quantized
(7)
this model