--- license: other license_name: tencent-hunyuan-community license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt base_model: tencent/HunyuanImage-3.0-Instruct-Distil pipeline_tag: text-to-image library_name: transformers tags: - Hunyuan - hunyuan - quantization - nf4 - comfyui - custom-nodes - autoregressive - DiT - HunyuanImage-3.0 - instruct - image-editing - bitsandbytes - 4bit - distilled --- # Hunyuan Image 3.0 Instruct Distil -- NF4 Quantized (v2) NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct Distil model (v2). The most accessible option -- fits on a single 48GB GPU with ~6x faster generation (8 steps vs 50). Best balance of speed, quality, and VRAM. ## What's New in v2 v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality. ## Key Features - **Instruct model** -- supports text-to-image, image editing, multi-image fusion - **Chain-of-Thought** -- built-in `think_recaption` mode for highest quality - **NF4 quantized** -- ~48 GB on disk - **8 diffusion steps** (CFG-distilled) - **Block swap support** -- offload transformer blocks to CPU for lower VRAM - **ComfyUI ready** -- works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes ## VRAM Requirements | Component | Memory | |-----------|--------| | Weight Loading | ~29 GB weights | | Inference (additional) | ~12-20 GB inference | | **Total** | **~41-49 GB** | **Recommended Hardware:** - **Single 48GB GPU** (RTX 6000 Ada, RTX PRO 5000, A6000) - With block swap: may work on 24GB GPUs (swapping ~20 blocks) ## Model Details - **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer - **Parameters:** 80B total, 13B active per token (top-K MoE routing) - **Variant:** Instruct Distil (CFG-Distilled, 8-step) - **Quantization:** 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization - **Diffusion Steps:** 8 - **Default Guidance Scale:** 2.5 - **Resolution:** Up to 2048x2048 - **Language:** English and Chinese prompts ### Distillation This is the **CFG-Distilled** variant: - Only **8 diffusion steps** needed (vs 50 for the full Instruct model) - **~6x faster** image generation - No quality loss -- distilled to match the full model's output - `cfg_distilled: true` means no classifier-free guidance needed ## Quantization Details **Layers quantized to NF4:** - Feed-forward networks (FFN/MLP layers) - Expert layers in MoE architecture (64 experts per layer) - Large linear transformations **Kept in full precision (BF16):** - VAE encoder/decoder (critical for image quality) - Attention projection layers (q_proj, k_proj, v_proj, o_proj) - Patch embedding layers - Time embedding layers - Vision model (SigLIP2) - Final output layers ## Usage ### ComfyUI (Recommended) This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes: ```bash cd ComfyUI/custom_nodes git clone https://github.com/EricRollei/Comfy_HunyuanImage3 ``` 1. Download this model to your preferred models directory 2. Use the **"Hunyuan 3 Instruct Loader"** node 3. Select this model folder and choose `nf4` precision 4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image 5. Or use **"Hunyuan 3 Instruct Edit"** for image editing 6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images ### Bot Task Modes The Instruct model supports three generation modes: | Mode | Description | Speed | |------|-------------|-------| | `image` | Direct text-to-image, prompt used as-is | Fastest | | `recaption` | Model rewrites prompt into detailed description, then generates | Medium | | `think_recaption` | CoT reasoning -> prompt enhancement -> generation (best quality) | Slowest | ## Block Swap Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step. | blocks_to_swap | VRAM Saved | Recommended For | |---------------|------------|-----------------| | 0 | 0 GB | 96GB+ GPU (no swap needed) | | 4 | ~5 GB | 80-90GB GPU | | 8 | ~10 GB | 64-80GB GPU | | 16 | ~19 GB | 48-64GB GPU | | -1 (auto) | varies | Let the system decide | ## Original Model This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](tencent/HunyuanImage-3.0-Instruct-Distil). - **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt) ## Credits - **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent) - **Quantization:** Eric Rollei - **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) ## License This model inherits the license from the original Hunyuan Image 3.0 model: [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)