|
|
---
|
|
|
license: other
|
|
|
license_name: tencent-hunyuan-community
|
|
|
license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
|
|
|
base_model: tencent/HunyuanImage-3.0-Instruct-Distil
|
|
|
pipeline_tag: text-to-image
|
|
|
library_name: transformers
|
|
|
tags:
|
|
|
- Hunyuan
|
|
|
- hunyuan
|
|
|
- quantization
|
|
|
- int8
|
|
|
- comfyui
|
|
|
- custom nodes
|
|
|
- autoregressive
|
|
|
- Dit
|
|
|
- HunyuanImage-3.0
|
|
|
- instruct
|
|
|
- image-editing
|
|
|
- bitsandbytes
|
|
|
- distilled
|
|
|
---
|
|
|
|
|
|
# Hunyuan Image 3.0 Instruct Distil β INT8 Quantized
|
|
|
|
|
|
INT8 quantization of the HunyuanImage-3.0 Instruct Distil model. CFG-distilled for ~6x faster generation (8 steps vs 50). Same quality as the full Instruct model with dramatically faster inference.
|
|
|
|
|
|
## Key Features
|
|
|
|
|
|
- π― **Instruct model** β supports text-to-image, image editing, multi-image fusion
|
|
|
- π§ **Chain-of-Thought** β built-in `think_recaption` mode for highest quality
|
|
|
- πΎ **INT8 quantized** β ~81 GB on disk
|
|
|
- β‘ **8 diffusion steps** (CFG-distilled for speed)
|
|
|
- π§ **ComfyUI ready** β works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes
|
|
|
|
|
|
## VRAM Requirements
|
|
|
|
|
|
| Component | Memory |
|
|
|
|-----------|--------|
|
|
|
| Weight Loading | ~80 GB weights |
|
|
|
| Inference (additional) | ~12-20 GB inference |
|
|
|
| **Total** | **~92-100 GB** |
|
|
|
|
|
|
**Recommended Hardware:**
|
|
|
|
|
|
- **NVIDIA RTX 6000 Blackwell (96GB)** β fits entirely in VRAM β
|
|
|
- **NVIDIA RTX 6000 Ada (48GB)** β requires CPU offloading
|
|
|
- Multi-GPU setups with 80GB+ combined VRAM
|
|
|
|
|
|
## Model Details
|
|
|
|
|
|
- **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
|
|
|
- **Parameters:** 80B total, 13B active per token (top-K MoE routing)
|
|
|
- **Variant:** Instruct Distil (CFG-Distilled, 8-step)
|
|
|
- **Quantization:** INT8 per-channel quantization via bitsandbytes
|
|
|
- **Diffusion Steps:** 8
|
|
|
- **Default Guidance Scale:** 2.5
|
|
|
- **Resolution:** Up to 2048x2048
|
|
|
- **Language:** English and Chinese prompts
|
|
|
|
|
|
### Distillation
|
|
|
|
|
|
This is the **CFG-Distilled** variant, which means:
|
|
|
- Only **8 diffusion steps** needed (vs 50 for the full Instruct model)
|
|
|
- **~6x faster** image generation
|
|
|
- No quality loss β distilled to match the full model's output
|
|
|
- `cfg_distilled: true` in config means no classifier-free guidance needed
|
|
|
|
|
|
## Quantization Details
|
|
|
|
|
|
**Layers quantized to INT8:**
|
|
|
- Feed-forward networks (FFN/MLP layers)
|
|
|
- Expert layers in MoE architecture (64 experts per layer)
|
|
|
- Large linear transformations
|
|
|
|
|
|
**Kept in full precision (BF16):**
|
|
|
- VAE encoder/decoder (critical for image quality)
|
|
|
- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
|
|
|
- Patch embedding layers
|
|
|
- Time embedding layers
|
|
|
- Vision model (SigLIP2)
|
|
|
- Final output layers
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
### ComfyUI (Recommended)
|
|
|
|
|
|
This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:
|
|
|
|
|
|
```bash
|
|
|
cd ComfyUI/custom_nodes
|
|
|
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
|
|
|
```
|
|
|
|
|
|
1. Download this model to your ComfyUI models directory
|
|
|
2. Use the **"Hunyuan 3 Instruct Loader"** node
|
|
|
3. Select this model folder and choose `int8` precision
|
|
|
4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image
|
|
|
5. Or use **"Hunyuan 3 Instruct Edit"** for image editing
|
|
|
6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images
|
|
|
|
|
|
### Bot Task Modes
|
|
|
|
|
|
The Instruct model supports three generation modes:
|
|
|
|
|
|
| Mode | Description | Speed |
|
|
|
|------|-------------|-------|
|
|
|
| `image` | Direct text-to-image, prompt used as-is | Fastest |
|
|
|
| `recaption` | Model rewrites prompt into detailed description, then generates | Medium |
|
|
|
| `think_recaption` | CoT reasoning β prompt enhancement β generation (best quality) | Slowest |
|
|
|
|
|
|
## Original Model
|
|
|
|
|
|
This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil).
|
|
|
|
|
|
- **Architecture:** Diffusion Transformer with Mixture-of-Experts
|
|
|
- **Resolution:** Up to 2048x2048
|
|
|
- **Language Support:** English and Chinese prompts
|
|
|
- **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
- Requires high-end professional GPU (~92-100 GB VRAM)
|
|
|
- INT8 quantization may introduce minor quality differences in edge cases
|
|
|
- Loading time adds ~1-2 minutes overhead to first generation
|
|
|
- CoT/recaption modes require additional time for text generation phase
|
|
|
|
|
|
## Credits
|
|
|
|
|
|
- **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent)
|
|
|
- **Quantization:** Eric Rollei
|
|
|
- **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)
|
|
|
|
|
|
## License
|
|
|
|
|
|
This model inherits the license from the original Hunyuan Image 3.0 model:
|
|
|
[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)
|
|
|
|
|
|
Please review the original license for commercial use restrictions and requirements.
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
```bibtex
|
|
|
@misc{hunyuan-image-3-int8-instruct,
|
|
|
author = {Rollei, Eric},
|
|
|
title = {Hunyuan Image 3.0 Instruct Distil β INT8 Quantized},
|
|
|
year = {2026},
|
|
|
publisher = {Hugging Face},
|
|
|
howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8}}
|
|
|
}
|
|
|
```
|
|
|
|