README.md · EricRollei/HunyuanImage-3.0-Instruct-Distil-NF4 at main

File size: 5,343 Bytes

6d1b668

---

license: other
license_name: tencent-hunyuan-community
license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
base_model: tencent/HunyuanImage-3.0-Instruct-Distil
pipeline_tag: text-to-image
library_name: transformers
tags:
- Hunyuan
- hunyuan
- quantization
- nf4
- comfyui
- custom nodes
- autoregressive
- Dit
- HunyuanImage-3.0
- instruct
- image-editing
- bitsandbytes
- 4bit
- distilled
---


# Hunyuan Image 3.0 Instruct Distil — NF4 Quantized

NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct Distil model. The most accessible option — fits on a single 48GB GPU with ~6x faster generation (8 steps vs 50). Best balance of speed, quality, and VRAM.

## Key Features

- 🎯 **Instruct model** — supports text-to-image, image editing, multi-image fusion
- 🧠 **Chain-of-Thought** — built-in `think_recaption` mode for highest quality
- 💾 **NF4 quantized** — ~45 GB on disk
- ⚡ **8 diffusion steps** (CFG-distilled for speed)
- 🔧 **ComfyUI ready** — works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes

## VRAM Requirements

| Component | Memory |
|-----------|--------|
| Weight Loading | ~29 GB weights |
| Inference (additional) | ~12-20 GB inference |
| **Total** | **~41-49 GB** |

**Recommended Hardware:**

- **Fits on a single 48GB GPU** (RTX 6000 Ada, RTX PRO 5000, A6000)
- Consumer GPUs (RTX 4090/5090 24GB) — not enough VRAM

## Model Details

- **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
- **Parameters:** 80B total, 13B active per token (top-K MoE routing)
- **Variant:** Instruct Distil (CFG-Distilled, 8-step)
- **Quantization:** 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization
- **Diffusion Steps:** 8
- **Default Guidance Scale:** 2.5
- **Resolution:** Up to 2048x2048
- **Language:** English and Chinese prompts

### Distillation

This is the **CFG-Distilled** variant, which means:
- Only **8 diffusion steps** needed (vs 50 for the full Instruct model)
- **~6x faster** image generation
- No quality loss — distilled to match the full model's output
- `cfg_distilled: true` in config means no classifier-free guidance needed

## Quantization Details

**Layers quantized to NF4:**
- Feed-forward networks (FFN/MLP layers)
- Expert layers in MoE architecture (64 experts per layer)
- Large linear transformations

**Kept in full precision (BF16):**
- VAE encoder/decoder (critical for image quality)
- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
- Patch embedding layers
- Time embedding layers
- Vision model (SigLIP2)
- Final output layers

## Usage

### ComfyUI (Recommended)

This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:

```bash

cd ComfyUI/custom_nodes

git clone https://github.com/EricRollei/Comfy_HunyuanImage3

```

1. Download this model to your ComfyUI models directory
2. Use the **"Hunyuan 3 Instruct Loader"** node
3. Select this model folder and choose `nf4` precision
4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image
5. Or use **"Hunyuan 3 Instruct Edit"** for image editing
6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images

### Bot Task Modes

The Instruct model supports three generation modes:

| Mode | Description | Speed |
|------|-------------|-------|
| `image` | Direct text-to-image, prompt used as-is | Fastest |
| `recaption` | Model rewrites prompt into detailed description, then generates | Medium |
| `think_recaption` | CoT reasoning → prompt enhancement → generation (best quality) | Slowest |

## Original Model

This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil).

- **Architecture:** Diffusion Transformer with Mixture-of-Experts
- **Resolution:** Up to 2048x2048
- **Language Support:** English and Chinese prompts
- **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

## Limitations

- Requires high-end professional GPU (~41-49 GB VRAM)
- NF4 quantization may introduce minor quality differences in edge cases
- Loading time adds ~1-2 minutes overhead to first generation
- CoT/recaption modes require additional time for text generation phase

## Credits

- **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent)
- **Quantization:** Eric Rollei
- **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)

## License

This model inherits the license from the original Hunyuan Image 3.0 model:
[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

Please review the original license for commercial use restrictions and requirements.

## Citation

```bibtex

@misc{hunyuan-image-3-nf4-instruct,

  author = {Rollei, Eric},

  title = {Hunyuan Image 3.0 Instruct Distil — NF4 Quantized},

  year = {2026},

  publisher = {Hugging Face},

  howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-NF4}}

}

```