Hunyuan Image 3.0 Instruct — INT8 Quantized
INT8 quantization of the HunyuanImage-3.0 Instruct model. Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement (recaption/think_recaption).
Key Features
- 🎯 Instruct model — supports text-to-image, image editing, multi-image fusion
- 🧠Chain-of-Thought — built-in
think_recaptionmode for highest quality - 💾 INT8 quantized — ~81 GB on disk
- âš¡ 50 diffusion steps (full quality)
- 🔧 ComfyUI ready — works with Comfy_HunyuanImage3 nodes
VRAM Requirements
| Component | Memory |
|---|---|
| Weight Loading | ~80 GB weights |
| Inference (additional) | ~12-20 GB inference |
| Total | ~92-100 GB |
Recommended Hardware:
- NVIDIA RTX 6000 Blackwell (96GB) — fits entirely in VRAM ✅
- NVIDIA RTX 6000 Ada (48GB) — requires CPU offloading
- Multi-GPU setups with 80GB+ combined VRAM
Model Details
- Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
- Parameters: 80B total, 13B active per token (top-K MoE routing)
- Variant: Instruct (Full)
- Quantization: INT8 per-channel quantization via bitsandbytes
- Diffusion Steps: 50
- Default Guidance Scale: 2.5
- Resolution: Up to 2048x2048
- Language: English and Chinese prompts
Quantization Details
Layers quantized to INT8:
- Feed-forward networks (FFN/MLP layers)
- Expert layers in MoE architecture (64 experts per layer)
- Large linear transformations
Kept in full precision (BF16):
- VAE encoder/decoder (critical for image quality)
- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
- Patch embedding layers
- Time embedding layers
- Vision model (SigLIP2)
- Final output layers
Usage
ComfyUI (Recommended)
This model is designed to work with the Comfy_HunyuanImage3 custom nodes:
cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
- Download this model to your ComfyUI models directory
- Use the "Hunyuan 3 Instruct Loader" node
- Select this model folder and choose
int8precision - Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
- Or use "Hunyuan 3 Instruct Edit" for image editing
- Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images
Bot Task Modes
The Instruct model supports three generation modes:
| Mode | Description | Speed |
|---|---|---|
image |
Direct text-to-image, prompt used as-is | Fastest |
recaption |
Model rewrites prompt into detailed description, then generates | Medium |
think_recaption |
CoT reasoning → prompt enhancement → generation (best quality) | Slowest |
Original Model
This is a quantized derivative of Tencent's HunyuanImage-3.0 Instruct.
- Architecture: Diffusion Transformer with Mixture-of-Experts
- Resolution: Up to 2048x2048
- Language Support: English and Chinese prompts
- License: Tencent Hunyuan Community License
Limitations
- Requires high-end professional GPU (~92-100 GB VRAM)
- INT8 quantization may introduce minor quality differences in edge cases
- Loading time adds ~1-2 minutes overhead to first generation
- CoT/recaption modes require additional time for text generation phase
Credits
- Original Model: Tencent Hunyuan Team
- Quantization: Eric Rollei
- ComfyUI Integration: Comfy_HunyuanImage3
License
This model inherits the license from the original Hunyuan Image 3.0 model: Tencent Hunyuan Community License
Please review the original license for commercial use restrictions and requirements.
Citation
@misc{hunyuan-image-3-int8-instruct,
author = {Rollei, Eric},
title = {Hunyuan Image 3.0 Instruct — INT8 Quantized},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-INT8}}
}
- Downloads last month
- 10
Model tree for EricRollei/HunyuanImage-3.0-Instruct-INT8
Base model
tencent/HunyuanImage-3.0-Instruct