Hunyuan Image 3.0 Instruct — INT8 Quantized

INT8 quantization of the HunyuanImage-3.0 Instruct model. Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement (recaption/think_recaption).

Key Features

🎯 Instruct model — supports text-to-image, image editing, multi-image fusion
🧠 Chain-of-Thought — built-in think_recaption mode for highest quality
💾 INT8 quantized — ~81 GB on disk
⚡ 50 diffusion steps (full quality)
🔧 ComfyUI ready — works with Comfy_HunyuanImage3 nodes

VRAM Requirements

Component	Memory
Weight Loading	~80 GB weights
Inference (additional)	~12-20 GB inference
Total	~92-100 GB

Recommended Hardware:

NVIDIA RTX 6000 Blackwell (96GB) — fits entirely in VRAM ✅
NVIDIA RTX 6000 Ada (48GB) — requires CPU offloading
Multi-GPU setups with 80GB+ combined VRAM

Model Details

Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
Parameters: 80B total, 13B active per token (top-K MoE routing)
Variant: Instruct (Full)
Quantization: INT8 per-channel quantization via bitsandbytes
Diffusion Steps: 50
Default Guidance Scale: 2.5
Resolution: Up to 2048x2048
Language: English and Chinese prompts

Quantization Details

Layers quantized to INT8:

Feed-forward networks (FFN/MLP layers)
Expert layers in MoE architecture (64 experts per layer)
Large linear transformations

Kept in full precision (BF16):

VAE encoder/decoder (critical for image quality)
Attention projection layers (q_proj, k_proj, v_proj, o_proj)
Patch embedding layers
Time embedding layers
Vision model (SigLIP2)
Final output layers

Usage

ComfyUI (Recommended)

This model is designed to work with the Comfy_HunyuanImage3 custom nodes:

cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3

Download this model to your ComfyUI models directory
Use the "Hunyuan 3 Instruct Loader" node
Select this model folder and choose int8 precision
Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
Or use "Hunyuan 3 Instruct Edit" for image editing
Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

Bot Task Modes

The Instruct model supports three generation modes:

Mode	Description	Speed
`image`	Direct text-to-image, prompt used as-is	Fastest
`recaption`	Model rewrites prompt into detailed description, then generates	Medium
`think_recaption`	CoT reasoning → prompt enhancement → generation (best quality)	Slowest

Original Model

This is a quantized derivative of Tencent's HunyuanImage-3.0 Instruct.

Architecture: Diffusion Transformer with Mixture-of-Experts
Resolution: Up to 2048x2048
Language Support: English and Chinese prompts
License: Tencent Hunyuan Community License

Limitations

Requires high-end professional GPU (~92-100 GB VRAM)
INT8 quantization may introduce minor quality differences in edge cases
Loading time adds ~1-2 minutes overhead to first generation
CoT/recaption modes require additional time for text generation phase

Credits

Original Model: Tencent Hunyuan Team
Quantization: Eric Rollei
ComfyUI Integration: Comfy_HunyuanImage3

License

This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License

Please review the original license for commercial use restrictions and requirements.

Citation

@misc{hunyuan-image-3-int8-instruct,
  author = {Rollei, Eric},
  title = {Hunyuan Image 3.0 Instruct — INT8 Quantized},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-INT8}}
}

Downloads last month: 13

Safetensors

Model size

83B params

Tensor type

BF16

F32

Model tree for EricRollei/HunyuanImage-3.0-Instruct-INT8

Base model

tencent/HunyuanImage-3.0-Instruct

Quantized

(5)

this model