Hunyuan Image 3.0 Base -- INT8 Quantized (v2)

INT8 quantization of the HunyuanImage-3.0 base model (v2). High-quality text-to-image generation with the Hunyuan 3.0 Diffusion Transformer + Mixture-of-Experts architecture. CFG-distilled for single-pass inference.

What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

Key Features

Text-to-image generation with the Hunyuan 3.0 MoE architecture
INT8 quantized -- ~82 GB on disk
45 diffusion steps (CFG-distilled, single-pass)
Block swap support -- offload transformer blocks to CPU for lower VRAM
ComfyUI ready -- works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component	Memory
Weight Loading	~80 GB weights
Inference (additional)	~10-15 GB inference
Total	~90-95 GB

Recommended Hardware:

NVIDIA RTX 6000 Blackwell (96GB) -- fits entirely with headroom
With block swap (4-8 blocks): fits on 64-80GB GPUs
NVIDIA RTX 6000 Ada (48GB) -- requires significant block swap

Model Details

Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
Parameters: 80B total, 13B active per token (top-K MoE routing)
Variant: Base (text-to-image)
Quantization: INT8 per-channel quantization via bitsandbytes
Diffusion Steps: 45
Default Guidance Scale: 7.0
Resolution: Up to 2048x2048
Language: English and Chinese prompts

Quantization Details

Layers quantized to INT8:

Feed-forward networks (FFN/MLP layers)
Expert layers in MoE architecture (64 experts per layer)
Large linear transformations

Kept in full precision (BF16):

VAE encoder/decoder (critical for image quality)
Attention projection layers (q_proj, k_proj, v_proj, o_proj)
Patch embedding layers
Time embedding layers
Vision model (SigLIP2)
Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3

Download this model to your preferred models directory
Use the "Hunyuan 3 V2 Unified" node
Point the model path to this folder and select int8 precision
Set blocks_to_swap to -1 (auto) or a manual value based on your VRAM

Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step.

blocks_to_swap	VRAM Saved	Recommended For
0	0 GB	96GB+ GPU (no swap needed)
4	~10 GB	80-90GB GPU
8	~20 GB	64-80GB GPU
16	~40 GB	48-64GB GPU
-1 (auto)	varies	Let the system decide

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0.

License: Tencent Hunyuan Community License

Credits

Original Model: Tencent Hunyuan Team
Quantization: Eric Rollei
ComfyUI Integration: Comfy_HunyuanImage3

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

Downloads last month: 25

Safetensors

Model size

83B params

Tensor type

F32

BF16

Model tree for EricRollei/HunyuanImage-3-INT8-v2

Base model

tencent/HunyuanImage-3.0

Quantized

(8)

this model