--- license: other license_name: tencent-hunyuan-community license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt base_model: tencent/HunyuanImage-3.0-Instruct pipeline_tag: text-to-image library_name: transformers tags: - Hunyuan - hunyuan - quantization - int8 - comfyui - custom-nodes - autoregressive - DiT - HunyuanImage-3.0 - instruct - image-editing - bitsandbytes --- # Hunyuan Image 3.0 Instruct -- INT8 Quantized (v2) INT8 quantization of the HunyuanImage-3.0 Instruct model (v2). Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement (recaption/think_recaption). ## What's New in v2 v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality. ## Key Features - **Instruct model** -- supports text-to-image, image editing, multi-image fusion - **Chain-of-Thought** -- built-in `think_recaption` mode for highest quality - **INT8 quantized** -- ~83 GB on disk - **50 diffusion steps** (full quality) - **Block swap support** -- offload transformer blocks to CPU for lower VRAM - **ComfyUI ready** -- works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes ## VRAM Requirements | Component | Memory | |-----------|--------| | Weight Loading | ~80 GB weights | | Inference (additional) | ~12-20 GB inference | | **Total** | **~92-100 GB** | **Recommended Hardware:** - **NVIDIA RTX 6000 Blackwell (96GB)** -- fits entirely with headroom - With block swap (4-8 blocks): fits on 64-80GB GPUs - **NVIDIA RTX 6000 Ada (48GB)** -- requires significant block swap ## Model Details - **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer - **Parameters:** 80B total, 13B active per token (top-K MoE routing) - **Variant:** Instruct (Full) - **Quantization:** INT8 per-channel quantization via bitsandbytes - **Diffusion Steps:** 50 - **Default Guidance Scale:** 2.5 - **Resolution:** Up to 2048x2048 - **Language:** English and Chinese prompts ## Quantization Details **Layers quantized to INT8:** - Feed-forward networks (FFN/MLP layers) - Expert layers in MoE architecture (64 experts per layer) - Large linear transformations **Kept in full precision (BF16):** - VAE encoder/decoder (critical for image quality) - Attention projection layers (q_proj, k_proj, v_proj, o_proj) - Patch embedding layers - Time embedding layers - Vision model (SigLIP2) - Final output layers ## Usage ### ComfyUI (Recommended) This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes: ```bash cd ComfyUI/custom_nodes git clone https://github.com/EricRollei/Comfy_HunyuanImage3 ``` 1. Download this model to your preferred models directory 2. Use the **"Hunyuan 3 Instruct Loader"** node 3. Select this model folder and choose `int8` precision 4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image 5. Or use **"Hunyuan 3 Instruct Edit"** for image editing 6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images ### Bot Task Modes The Instruct model supports three generation modes: | Mode | Description | Speed | |------|-------------|-------| | `image` | Direct text-to-image, prompt used as-is | Fastest | | `recaption` | Model rewrites prompt into detailed description, then generates | Medium | | `think_recaption` | CoT reasoning -> prompt enhancement -> generation (best quality) | Slowest | ## Block Swap Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the full model requires. The system keeps N transformer blocks on CPU and swaps them to GPU on demand during each diffusion step. | blocks_to_swap | VRAM Saved | Recommended For | |---------------|------------|-----------------| | 0 | 0 GB | 96GB+ GPU (no swap needed) | | 4 | ~10 GB | 80-90GB GPU | | 8 | ~20 GB | 64-80GB GPU | | 16 | ~40 GB | 48-64GB GPU | | -1 (auto) | varies | Let the system decide | ## Original Model This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](tencent/HunyuanImage-3.0-Instruct). - **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt) ## Credits - **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent) - **Quantization:** Eric Rollei - **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) ## License This model inherits the license from the original Hunyuan Image 3.0 model: [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)