| | ---
|
| | license: other
|
| | license_name: tencent-hunyuan-community
|
| | license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
|
| | base_model: tencent/HunyuanImage-3.0-Instruct
|
| | pipeline_tag: text-to-image
|
| | library_name: transformers
|
| | tags:
|
| | - Hunyuan
|
| | - hunyuan
|
| | - quantization
|
| | - int8
|
| | - comfyui
|
| | - custom-nodes
|
| | - autoregressive
|
| | - DiT
|
| | - HunyuanImage-3.0
|
| | - instruct
|
| | - image-editing
|
| | - bitsandbytes
|
| | ---
|
| |
|
| | # Hunyuan Image 3.0 Instruct -- INT8 Quantized (v2)
|
| |
|
| | INT8 quantization of the HunyuanImage-3.0 Instruct model (v2). Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement (recaption/think_recaption).
|
| |
|
| | ## What's New in v2
|
| |
|
| | v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.
|
| |
|
| | ## Key Features
|
| |
|
| | - **Instruct model** -- supports text-to-image, image editing, multi-image fusion
|
| | - **Chain-of-Thought** -- built-in `think_recaption` mode for highest quality
|
| | - **INT8 quantized** -- ~83 GB on disk
|
| | - **50 diffusion steps** (full quality)
|
| | - **Block swap support** -- offload transformer blocks to CPU for lower VRAM
|
| | - **ComfyUI ready** -- works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes
|
| |
|
| | ## VRAM Requirements
|
| |
|
| | | Component | Memory |
|
| | |-----------|--------|
|
| | | Weight Loading | ~80 GB weights |
|
| | | Inference (additional) | ~12-20 GB inference |
|
| | | **Total** | **~92-100 GB** |
|
| |
|
| | **Recommended Hardware:**
|
| |
|
| | - **NVIDIA RTX 6000 Blackwell (96GB)** -- fits entirely with headroom
|
| | - With block swap (4-8 blocks): fits on 64-80GB GPUs
|
| | - **NVIDIA RTX 6000 Ada (48GB)** -- requires significant block swap
|
| |
|
| |
|
| | ## Model Details
|
| |
|
| | - **Architecture:** HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
|
| | - **Parameters:** 80B total, 13B active per token (top-K MoE routing)
|
| | - **Variant:** Instruct (Full)
|
| | - **Quantization:** INT8 per-channel quantization via bitsandbytes
|
| | - **Diffusion Steps:** 50
|
| | - **Default Guidance Scale:** 2.5
|
| | - **Resolution:** Up to 2048x2048
|
| | - **Language:** English and Chinese prompts
|
| |
|
| | ## Quantization Details
|
| |
|
| | **Layers quantized to INT8:**
|
| | - Feed-forward networks (FFN/MLP layers)
|
| | - Expert layers in MoE architecture (64 experts per layer)
|
| | - Large linear transformations
|
| |
|
| | **Kept in full precision (BF16):**
|
| | - VAE encoder/decoder (critical for image quality)
|
| | - Attention projection layers (q_proj, k_proj, v_proj, o_proj)
|
| | - Patch embedding layers
|
| | - Time embedding layers
|
| | - Vision model (SigLIP2)
|
| | - Final output layers
|
| |
|
| | ## Usage
|
| |
|
| | ### ComfyUI (Recommended)
|
| |
|
| | This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:
|
| |
|
| | ```bash
|
| | cd ComfyUI/custom_nodes
|
| | git clone https://github.com/EricRollei/Comfy_HunyuanImage3
|
| | ```
|
| |
|
| | 1. Download this model to your preferred models directory
|
| | 2. Use the **"Hunyuan 3 Instruct Loader"** node
|
| | 3. Select this model folder and choose `int8` precision
|
| | 4. Connect to the **"Hunyuan 3 Instruct Generate"** node for text-to-image
|
| | 5. Or use **"Hunyuan 3 Instruct Edit"** for image editing
|
| | 6. Or use **"Hunyuan 3 Instruct Multi-Fusion"** for combining multiple images
|
| |
|
| | ### Bot Task Modes
|
| |
|
| | The Instruct model supports three generation modes:
|
| |
|
| | | Mode | Description | Speed |
|
| | |------|-------------|-------|
|
| | | `image` | Direct text-to-image, prompt used as-is | Fastest |
|
| | | `recaption` | Model rewrites prompt into detailed description, then generates | Medium |
|
| | | `think_recaption` | CoT reasoning -> prompt enhancement -> generation (best quality) | Slowest |
|
| |
|
| | ## Block Swap
|
| |
|
| | Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the
|
| | full model requires. The system keeps N transformer blocks on CPU and swaps them
|
| | to GPU on demand during each diffusion step.
|
| |
|
| | | blocks_to_swap | VRAM Saved | Recommended For |
|
| | |---------------|------------|-----------------|
|
| | | 0 | 0 GB | 96GB+ GPU (no swap needed) |
|
| | | 4 | ~10 GB | 80-90GB GPU |
|
| | | 8 | ~20 GB | 64-80GB GPU |
|
| | | 16 | ~40 GB | 48-64GB GPU |
|
| | | -1 (auto) | varies | Let the system decide |
|
| |
|
| | ## Original Model
|
| |
|
| | This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](tencent/HunyuanImage-3.0-Instruct).
|
| |
|
| | - **License:** [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)
|
| |
|
| | ## Credits
|
| |
|
| | - **Original Model:** [Tencent Hunyuan Team](https://huggingface.co/tencent)
|
| | - **Quantization:** Eric Rollei
|
| | - **ComfyUI Integration:** [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)
|
| |
|
| | ## License
|
| |
|
| | This model inherits the license from the original Hunyuan Image 3.0 model:
|
| | [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)
|
| |
|