File size: 5,347 Bytes
c7c6837
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---

license: other
license_name: tencent-hunyuan-community
license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
base_model: tencent/HunyuanImage-3.0-Instruct-Distil
pipeline_tag: text-to-image
library_name: transformers
tags:
- Hunyuan
- hunyuan
- quantization
- int8
- comfyui
- custom nodes
- autoregressive
- Dit
- HunyuanImage-3.0
- instruct
- image-editing
- bitsandbytes
- distilled
---


# Hunyuan Image 3.0 Instruct Distil β€” INT8 Quantized

INT8 quantization of the HunyuanImage-3.0 Instruct Distil model. CFG-distilled for ~6x faster generation (8 steps vs 50). Same quality as the full Instruct model with dramatically faster inference.

## Key Features

- 🎯 **Instruct model** β€” supports text-to-image, image editing, multi-image fusion
- 🧠 **Chain-of-Thought** β€” built-in `think_recaption` mode for highest quality
- πŸ’Ύ **INT8 quantized** β€” ~81 GB on disk
- ⚑ **8 diffusion steps** (CFG-distilled for speed)
- πŸ”§ **ComfyUI ready** β€” works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes

## VRAM Requirements

| Component | Memory |
|-----------|--------|
| Weight Loading | ~80 GB weights |
| Inference (additional) | ~12-20 GB inference |
| **Total** | **~92-100 GB** |

**Recommended Hardware:**

- **NVIDIA RTX 6000 Blackwell (96GB)** β€” fits entirely in VRAM βœ…
- **NVIDIA RTX 6000 Ada (48GB)** β€” requires CPU offloading
- Multi-GPU setups with 80GB+ combined VRAM

## Model Details

- **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
- **Parameters:** 80B total, 13B active per token (top-K MoE routing)
- **Variant:** Instruct Distil (CFG-Distilled, 8-step)
- **Quantization:** INT8 per-channel quantization via bitsandbytes
- **Diffusion Steps:** 8
- **Default Guidance Scale:** 2.5
- **Resolution:** Up to 2048x2048
- **Language:** English and Chinese prompts

### Distillation

This is the **CFG-Distilled** variant, which means:
- Only **8 diffusion steps** needed (vs 50 for the full Instruct model)
- **~6x faster** image generation
- No quality loss β€” distilled to match the full model's output
- `cfg_distilled: true` in config means no classifier-free guidance needed

## Quantization Details

**Layers quantized to INT8:**
- Feed-forward networks (FFN/MLP layers)
- Expert layers in MoE architecture (64 experts per layer)
- Large linear transformations

**Kept in full precision (BF16):**
- VAE encoder/decoder (critical for image quality)
- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
- Patch embedding layers
- Time embedding layers
- Vision model (SigLIP2)
- Final output layers

## Usage

### ComfyUI (Recommended)

This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:

```bash

cd ComfyUI/custom_nodes

git clone https://github.com/EricRollei/Comfy_HunyuanImage3

```

1. Download this model to your ComfyUI models directory
2. Use the **"Hunyuan 3 Instruct Loader"** node
3. Select this model folder and choose `int8` precision
4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image
5. Or use **"Hunyuan 3 Instruct Edit"** for image editing
6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images

### Bot Task Modes

The Instruct model supports three generation modes:

| Mode | Description | Speed |
|------|-------------|-------|
| `image` | Direct text-to-image, prompt used as-is | Fastest |
| `recaption` | Model rewrites prompt into detailed description, then generates | Medium |
| `think_recaption` | CoT reasoning β†’ prompt enhancement β†’ generation (best quality) | Slowest |

## Original Model

This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil).

- **Architecture:** Diffusion Transformer with Mixture-of-Experts
- **Resolution:** Up to 2048x2048
- **Language Support:** English and Chinese prompts
- **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

## Limitations

- Requires high-end professional GPU (~92-100 GB VRAM)
- INT8 quantization may introduce minor quality differences in edge cases
- Loading time adds ~1-2 minutes overhead to first generation
- CoT/recaption modes require additional time for text generation phase

## Credits

- **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent)
- **Quantization:** Eric Rollei
- **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)

## License

This model inherits the license from the original Hunyuan Image 3.0 model:
[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

Please review the original license for commercial use restrictions and requirements.

## Citation

```bibtex

@misc{hunyuan-image-3-int8-instruct,

  author = {Rollei, Eric},

  title = {Hunyuan Image 3.0 Instruct Distil β€” INT8 Quantized},

  year = {2026},

  publisher = {Hugging Face},

  howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8}}

}

```