README.md · EricRollei/HunyuanImage-3.0-Instruct-INT8-v2 at main

HunyuanImage-3.0-Instruct-INT8-v2 / README.md

EricRollei

Upload folder using huggingface_hub

12ab833 verified 2 days ago

preview code

raw

history blame contribute delete

4.87 kB

	---
	license: other
	license_name: tencent-hunyuan-community
	license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
	base_model: tencent/HunyuanImage-3.0-Instruct
	pipeline_tag: text-to-image
	library_name: transformers
	tags:
	- Hunyuan
	- hunyuan
	- quantization
	- int8
	- comfyui
	- custom-nodes
	- autoregressive
	- DiT
	- HunyuanImage-3.0
	- instruct
	- image-editing
	- bitsandbytes
	---

	# Hunyuan Image 3.0 Instruct -- INT8 Quantized (v2)

	INT8 quantization of the HunyuanImage-3.0 Instruct model (v2). Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement (recaption/think_recaption).

	## What's New in v2

	v2 uses improved quantization with more precise skip-module selection, keeping attention projections and critical embedding layers in full BF16 precision for better image quality.

	## Key Features

	- Instruct model -- supports text-to-image, image editing, multi-image fusion
	- Chain-of-Thought -- built-in `think_recaption` mode for highest quality
	- INT8 quantized -- ~83 GB on disk
	- 50 diffusion steps (full quality)
	- Block swap support -- offload transformer blocks to CPU for lower VRAM
	- ComfyUI ready -- works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes

	## VRAM Requirements

	\| Component \| Memory \|
	\|-----------\|--------\|
	\| Weight Loading \| ~80 GB weights \|
	\| Inference (additional) \| ~12-20 GB inference \|
	\| Total \| ~92-100 GB \|

	Recommended Hardware:

	- NVIDIA RTX 6000 Blackwell (96GB) -- fits entirely with headroom
	- With block swap (4-8 blocks): fits on 64-80GB GPUs
	- NVIDIA RTX 6000 Ada (48GB) -- requires significant block swap


	## Model Details

	- Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
	- Parameters: 80B total, 13B active per token (top-K MoE routing)
	- Variant: Instruct (Full)
	- Quantization: INT8 per-channel quantization via bitsandbytes
	- Diffusion Steps: 50
	- Default Guidance Scale: 2.5
	- Resolution: Up to 2048x2048
	- Language: English and Chinese prompts

	## Quantization Details

	Layers quantized to INT8:
	- Feed-forward networks (FFN/MLP layers)
	- Expert layers in MoE architecture (64 experts per layer)
	- Large linear transformations

	Kept in full precision (BF16):
	- VAE encoder/decoder (critical for image quality)
	- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
	- Patch embedding layers
	- Time embedding layers
	- Vision model (SigLIP2)
	- Final output layers

	## Usage

	### ComfyUI (Recommended)

	This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:

	```bash
	cd ComfyUI/custom_nodes
	git clone https://github.com/EricRollei/Comfy_HunyuanImage3
	```

	1. Download this model to your preferred models directory
	2. Use the "Hunyuan 3 Instruct Loader" node
	3. Select this model folder and choose `int8` precision
	4. Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
	5. Or use "Hunyuan 3 Instruct Edit" for image editing
	6. Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

	### Bot Task Modes

	The Instruct model supports three generation modes:

	\| Mode \| Description \| Speed \|
	\|------\|-------------\|-------\|
	\| `image` \| Direct text-to-image, prompt used as-is \| Fastest \|
	\| `recaption` \| Model rewrites prompt into detailed description, then generates \| Medium \|
	\| `think_recaption` \| CoT reasoning -> prompt enhancement -> generation (best quality) \| Slowest \|

	## Block Swap

	Block swap allows running INT8 and BF16 models on GPUs with less VRAM than the
	full model requires. The system keeps N transformer blocks on CPU and swaps them
	to GPU on demand during each diffusion step.

	\| blocks_to_swap \| VRAM Saved \| Recommended For \|
	\|---------------\|------------\|-----------------\|
	\| 0 \| 0 GB \| 96GB+ GPU (no swap needed) \|
	\| 4 \| ~10 GB \| 80-90GB GPU \|
	\| 8 \| ~20 GB \| 64-80GB GPU \|
	\| 16 \| ~40 GB \| 48-64GB GPU \|
	\| -1 (auto) \| varies \| Let the system decide \|

	## Original Model

	This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](tencent/HunyuanImage-3.0-Instruct).

	- License: [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

	## Credits

	- Original Model: [Tencent Hunyuan Team](https://huggingface.co/tencent)
	- Quantization: Eric Rollei
	- ComfyUI Integration: [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)

	## License

	This model inherits the license from the original Hunyuan Image 3.0 model:
	[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)