--- license: other license_name: tencent-hunyuan-community license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt base_model: tencent/HunyuanImage-3.0-Instruct pipeline_tag: text-to-image library_name: transformers tags: - Hunyuan - hunyuan - quantization - nf4 - comfyui - custom nodes - autoregressive - Dit - HunyuanImage-3.0 - instruct - image-editing - bitsandbytes - 4bit --- # Hunyuan Image 3.0 Instruct — NF4 Quantized NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct model. Fits on a single 48GB GPU. Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement. ## Key Features - 🎯 **Instruct model** — supports text-to-image, image editing, multi-image fusion - 🧠 **Chain-of-Thought** — built-in `think_recaption` mode for highest quality - 💾 **NF4 quantized** — ~45 GB on disk - ⚡ **50 diffusion steps** (full quality) - 🔧 **ComfyUI ready** — works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes ## VRAM Requirements | Component | Memory | |-----------|--------| | Weight Loading | ~29 GB weights | | Inference (additional) | ~12-20 GB inference | | **Total** | **~41-49 GB** | **Recommended Hardware:** - **Fits on a single 48GB GPU** (RTX 6000 Ada, RTX PRO 5000, A6000) - Consumer GPUs (RTX 4090/5090 24GB) — not enough VRAM ## Model Details - **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer - **Parameters:** 80B total, 13B active per token (top-K MoE routing) - **Variant:** Instruct (Full) - **Quantization:** 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization - **Diffusion Steps:** 50 - **Default Guidance Scale:** 2.5 - **Resolution:** Up to 2048x2048 - **Language:** English and Chinese prompts ## Quantization Details **Layers quantized to NF4:** - Feed-forward networks (FFN/MLP layers) - Expert layers in MoE architecture (64 experts per layer) - Large linear transformations **Kept in full precision (BF16):** - VAE encoder/decoder (critical for image quality) - Attention projection layers (q_proj, k_proj, v_proj, o_proj) - Patch embedding layers - Time embedding layers - Vision model (SigLIP2) - Final output layers ## Usage ### ComfyUI (Recommended) This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes: ```bash cd ComfyUI/custom_nodes git clone https://github.com/EricRollei/Comfy_HunyuanImage3 ``` 1. Download this model to your ComfyUI models directory 2. Use the **"Hunyuan 3 Instruct Loader"** node 3. Select this model folder and choose `nf4` precision 4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image 5. Or use **"Hunyuan 3 Instruct Edit"** for image editing 6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images ### Bot Task Modes The Instruct model supports three generation modes: | Mode | Description | Speed | |------|-------------|-------| | `image` | Direct text-to-image, prompt used as-is | Fastest | | `recaption` | Model rewrites prompt into detailed description, then generates | Medium | | `think_recaption` | CoT reasoning → prompt enhancement → generation (best quality) | Slowest | ## Original Model This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct). - **Architecture:** Diffusion Transformer with Mixture-of-Experts - **Resolution:** Up to 2048x2048 - **Language Support:** English and Chinese prompts - **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt) ## Limitations - Requires high-end professional GPU (~41-49 GB VRAM) - NF4 quantization may introduce minor quality differences in edge cases - Loading time adds ~1-2 minutes overhead to first generation - CoT/recaption modes require additional time for text generation phase ## Credits - **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent) - **Quantization:** Eric Rollei - **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) ## License This model inherits the license from the original Hunyuan Image 3.0 model: [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt) Please review the original license for commercial use restrictions and requirements. ## Citation ```bibtex @misc{hunyuan-image-3-nf4-instruct, author = {Rollei, Eric}, title = {Hunyuan Image 3.0 Instruct — NF4 Quantized}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-NF4}} } ```