HunyuanImage-3.0-Instruct-NF4 / README.md

Upload folder using huggingface_hub

41ba802 verified 5 days ago

4.91 kB

	---
	license: other
	license_name: tencent-hunyuan-community
	license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
	base_model: tencent/HunyuanImage-3.0-Instruct
	pipeline_tag: text-to-image
	library_name: transformers
	tags:
	- Hunyuan
	- hunyuan
	- quantization
	- nf4
	- comfyui
	- custom nodes
	- autoregressive
	- Dit
	- HunyuanImage-3.0
	- instruct
	- image-editing
	- bitsandbytes
	- 4bit
	---

	# Hunyuan Image 3.0 Instruct — NF4 Quantized

	NF4 (4-bit) quantization of the HunyuanImage-3.0 Instruct model. Fits on a single 48GB GPU. Supports text-to-image, image editing, multi-image fusion, and Chain-of-Thought prompt enhancement.

	## Key Features

	- 🎯 Instruct model — supports text-to-image, image editing, multi-image fusion
	- 🧠 Chain-of-Thought — built-in `think_recaption` mode for highest quality
	- 💾 NF4 quantized — ~45 GB on disk
	- ⚡ 50 diffusion steps (full quality)
	- 🔧 ComfyUI ready — works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes

	## VRAM Requirements

	\| Component \| Memory \|
	\|-----------\|--------\|
	\| Weight Loading \| ~29 GB weights \|
	\| Inference (additional) \| ~12-20 GB inference \|
	\| Total \| ~41-49 GB \|

	Recommended Hardware:

	- Fits on a single 48GB GPU (RTX 6000 Ada, RTX PRO 5000, A6000)
	- Consumer GPUs (RTX 4090/5090 24GB) — not enough VRAM

	## Model Details

	- Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
	- Parameters: 80B total, 13B active per token (top-K MoE routing)
	- Variant: Instruct (Full)
	- Quantization: 4-bit NormalFloat (NF4) quantization via bitsandbytes with double quantization
	- Diffusion Steps: 50
	- Default Guidance Scale: 2.5
	- Resolution: Up to 2048x2048
	- Language: English and Chinese prompts

	## Quantization Details

	Layers quantized to NF4:
	- Feed-forward networks (FFN/MLP layers)
	- Expert layers in MoE architecture (64 experts per layer)
	- Large linear transformations

	Kept in full precision (BF16):
	- VAE encoder/decoder (critical for image quality)
	- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
	- Patch embedding layers
	- Time embedding layers
	- Vision model (SigLIP2)
	- Final output layers

	## Usage

	### ComfyUI (Recommended)

	This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:

	```bash
	cd ComfyUI/custom_nodes
	git clone https://github.com/EricRollei/Comfy_HunyuanImage3
	```

	1. Download this model to your ComfyUI models directory
	2. Use the "Hunyuan 3 Instruct Loader" node
	3. Select this model folder and choose `nf4` precision
	4. Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
	5. Or use "Hunyuan 3 Instruct Edit" for image editing
	6. Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

	### Bot Task Modes

	The Instruct model supports three generation modes:

	\| Mode \| Description \| Speed \|
	\|------\|-------------\|-------\|
	\| `image` \| Direct text-to-image, prompt used as-is \| Fastest \|
	\| `recaption` \| Model rewrites prompt into detailed description, then generates \| Medium \|
	\| `think_recaption` \| CoT reasoning → prompt enhancement → generation (best quality) \| Slowest \|

	## Original Model

	This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct).

	- Architecture: Diffusion Transformer with Mixture-of-Experts
	- Resolution: Up to 2048x2048
	- Language Support: English and Chinese prompts
	- License: [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

	## Limitations

	- Requires high-end professional GPU (~41-49 GB VRAM)
	- NF4 quantization may introduce minor quality differences in edge cases
	- Loading time adds ~1-2 minutes overhead to first generation
	- CoT/recaption modes require additional time for text generation phase

	## Credits

	- Original Model: [Tencent Hunyuan Team](https://huggingface.co/tencent)
	- Quantization: Eric Rollei
	- ComfyUI Integration: [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)

	## License

	This model inherits the license from the original Hunyuan Image 3.0 model:
	[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

	Please review the original license for commercial use restrictions and requirements.

	## Citation

	```bibtex
	@misc{hunyuan-image-3-nf4-instruct,
	author = {Rollei, Eric},
	title = {Hunyuan Image 3.0 Instruct — NF4 Quantized},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-NF4}}
	}
	```