---
license: other
license_name: tencent-hunyuan-community
license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
base_model: tencent/HunyuanImage-3.0-Instruct
pipeline_tag: text-to-image
library_name: transformers
tags:
- Hunyuan
- hunyuan
- quantization
- int8
- comfyui
- custom-nodes
- autoregressive
- DiT
- HunyuanImage-3.0
- instruct
- image-editing
- bitsandbytes
---

# Hunyuan Image 3.0 Instruct -- INT8 Quantized (v2)

INT8 quantization of the HunyuanImage-3.0 Instruct model (v2). Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement (recaption/think_recaption).

## What's New in v2

v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

## Key Features

- **Instruct model** -- supports text-to-image, image editing, multi-image fusion
- **Chain-of-Thought** -- built-in `think_recaption` mode for highest quality
- **INT8 quantized** -- ~83 GB on disk
- **50 diffusion steps** (full quality)
- **Block swap support** -- offload transformer blocks to CPU for lower VRAM
- **ComfyUI ready** -- works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes

## VRAM Requirements

| Component | Memory |
|-----------|--------|
| Weight Loading | ~80 GB weights |
| Inference (additional) | ~12-20 GB inference |
| **Total** | **~92-100 GB** |

**Recommended Hardware:**

- **NVIDIA RTX 6000 Blackwell (96GB)** -- fits entirely with headroom
- With block swap (4-8 blocks): fits on 64-80GB GPUs
- **NVIDIA RTX 6000 Ada (48GB)** -- requires significant block swap


## Model Details

- **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
- **Parameters:** 80B total, 13B active per token (top-K MoE routing)
- **Variant:** Instruct (Full)
- **Quantization:** INT8 per-channel quantization via bitsandbytes
- **Diffusion Steps:** 50
- **Default Guidance Scale:** 2.5
- **Resolution:** Up to 2048x2048
- **Language:** English and Chinese prompts

## Quantization Details

**Layers quantized to INT8:**
- Feed-forward networks (FFN/MLP layers)
- Expert layers in MoE architecture (64 experts per layer)
- Large linear transformations

**Kept in full precision (BF16):**
- VAE encoder/decoder (critical for image quality)
- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
- Patch embedding layers
- Time embedding layers
- Vision model (SigLIP2)
- Final output layers

## Usage

### ComfyUI (Recommended)

This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:

```bash
cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
```

1. Download this model to your preferred models directory
2. Use the **"Hunyuan 3 Instruct Loader"** node
3. Select this model folder and choose `int8` precision
4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image
5. Or use **"Hunyuan 3 Instruct Edit"** for image editing
6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images

### Bot Task Modes

The Instruct model supports three generation modes:

| Mode | Description | Speed |
|------|-------------|-------|
| `image` | Direct text-to-image, prompt used as-is | Fastest |
| `recaption` | Model rewrites prompt into detailed description, then generates | Medium |
| `think_recaption` | CoT reasoning -> prompt enhancement -> generation (best quality) | Slowest |

## Block Swap

Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the
full model requires. The system keeps N transformer blocks on CPU and swaps them
to GPU on demand during each diffusion step.

| blocks_to_swap | VRAM Saved | Recommended For |
|---------------|------------|-----------------|
| 0 | 0 GB | 96GB+ GPU (no swap needed) |
| 4 | ~10 GB | 80-90GB GPU |
| 8 | ~20 GB | 64-80GB GPU |
| 16 | ~40 GB | 48-64GB GPU |
| -1 (auto) | varies | Let the system decide |

## Original Model

This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](tencent/HunyuanImage-3.0-Instruct).

- **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

## Credits

- **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent)
- **Quantization:** Eric Rollei
- **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)

## License

This model inherits the license from the original Hunyuan Image 3.0 model:
[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)