File size: 5,218 Bytes
f8e25e7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ---
license: other
license_name: tencent-hunyuan-community
license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
base_model: tencent/HunyuanImage-3.0-Instruct-Distil
pipeline_tag: text-to-image
library_name: transformers
tags:
- Hunyuan
- hunyuan
- quantization
- nf4
- comfyui
- custom-nodes
- autoregressive
- DiT
- HunyuanImage-3.0
- instruct
- image-editing
- bitsandbytes
- 4bit
- distilled
---
# Hunyuan Image 3.0 Instruct Distil -- NF4 Quantized (v2)
NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct Distil model (v2). The most accessible option -- fits on a single 48GB GPU with ~6x faster generation (8 steps vs 50). Best balance of speed, quality, and VRAM.
## What's New in v2
v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.
## Key Features
- **Instruct model** -- supports text-to-image, image editing, multi-image fusion
- **Chain-of-Thought** -- built-in `think_recaption` mode for highest quality
- **NF4 quantized** -- ~48 GB on disk
- **8 diffusion steps** (CFG-distilled)
- **Block swap support** -- offload transformer blocks to CPU for lower VRAM
- **ComfyUI ready** -- works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes
## VRAM Requirements
| Component | Memory |
|-----------|--------|
| Weight Loading | ~29 GB weights |
| Inference (additional) | ~12-20 GB inference |
| **Total** | **~41-49 GB** |
**Recommended Hardware:**
- **Single 48GB GPU** (RTX 6000 Ada, RTX PRO 5000, A6000)
- With block swap: may work on 24GB GPUs (swapping ~20 blocks)
## Model Details
- **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
- **Parameters:** 80B total, 13B active per token (top-K MoE routing)
- **Variant:** Instruct Distil (CFG-Distilled, 8-step)
- **Quantization:** 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization
- **Diffusion Steps:** 8
- **Default Guidance Scale:** 2.5
- **Resolution:** Up to 2048x2048
- **Language:** English and Chinese prompts
### Distillation
This is the **CFG-Distilled** variant:
- Only **8 diffusion steps** needed (vs 50 for the full Instruct model)
- **~6x faster** image generation
- No quality loss -- distilled to match the full model's output
- `cfg_distilled: true` means no classifier-free guidance needed
## Quantization Details
**Layers quantized to NF4:**
- Feed-forward networks (FFN/MLP layers)
- Expert layers in MoE architecture (64 experts per layer)
- Large linear transformations
**Kept in full precision (BF16):**
- VAE encoder/decoder (critical for image quality)
- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
- Patch embedding layers
- Time embedding layers
- Vision model (SigLIP2)
- Final output layers
## Usage
### ComfyUI (Recommended)
This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:
```bash
cd ComfyUI/custom_nodes
git clone https://github.com/EricRollei/Comfy_HunyuanImage3
```
1. Download this model to your preferred models directory
2. Use the **"Hunyuan 3 Instruct Loader"** node
3. Select this model folder and choose `nf4` precision
4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image
5. Or use **"Hunyuan 3 Instruct Edit"** for image editing
6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images
### Bot Task Modes
The Instruct model supports three generation modes:
| Mode | Description | Speed |
|------|-------------|-------|
| `image` | Direct text-to-image, prompt used as-is | Fastest |
| `recaption` | Model rewrites prompt into detailed description, then generates | Medium |
| `think_recaption` | CoT reasoning -> prompt enhancement -> generation (best quality) | Slowest |
## Block Swap
Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the
full model requires. The system keeps N transformer blocks on CPU and swaps them
to GPU on demand during each diffusion step.
| blocks_to_swap | VRAM Saved | Recommended For |
|---------------|------------|-----------------|
| 0 | 0 GB | 96GB+ GPU (no swap needed) |
| 4 | ~5 GB | 80-90GB GPU |
| 8 | ~10 GB | 64-80GB GPU |
| 16 | ~19 GB | 48-64GB GPU |
| -1 (auto) | varies | Let the system decide |
## Original Model
This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](tencent/HunyuanImage-3.0-Instruct-Distil).
- **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)
## Credits
- **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent)
- **Quantization:** Eric Rollei
- **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)
## License
This model inherits the license from the original Hunyuan Image 3.0 model:
[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)
|