Upload folder using huggingface_hub

c7c6837 verified 5 days ago

5.35 kB

	---
	license: other
	license_name: tencent-hunyuan-community
	license_link: https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt
	base_model: tencent/HunyuanImage-3.0-Instruct-Distil
	pipeline_tag: text-to-image
	library_name: transformers
	tags:
	- Hunyuan
	- hunyuan
	- quantization
	- int8
	- comfyui
	- custom nodes
	- autoregressive
	- Dit
	- HunyuanImage-3.0
	- instruct
	- image-editing
	- bitsandbytes
	- distilled
	---

	# Hunyuan Image 3.0 Instruct Distil — INT8 Quantized

	INT8 quantization of the HunyuanImage-3.0 Instruct Distil model. CFG-distilled for ~6x faster generation (8 steps vs 50). Same quality as the full Instruct model with dramatically faster inference.

	## Key Features

	- 🎯 Instruct model — supports text-to-image, image editing, multi-image fusion
	- 🧠 Chain-of-Thought — built-in `think_recaption` mode for highest quality
	- 💾 INT8 quantized — ~81 GB on disk
	- ⚡ 8 diffusion steps (CFG-distilled for speed)
	- 🔧 ComfyUI ready — works with [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) nodes

	## VRAM Requirements

	\| Component \| Memory \|
	\|-----------\|--------\|
	\| Weight Loading \| ~80 GB weights \|
	\| Inference (additional) \| ~12-20 GB inference \|
	\| Total \| ~92-100 GB \|

	Recommended Hardware:

	- NVIDIA RTX 6000 Blackwell (96GB) — fits entirely in VRAM ✅
	- NVIDIA RTX 6000 Ada (48GB) — requires CPU offloading
	- Multi-GPU setups with 80GB+ combined VRAM

	## Model Details

	- Architecture: HunyuanImage-3.0 Mixture-of-Experts Diffusion Transformer
	- Parameters: 80B total, 13B active per token (top-K MoE routing)
	- Variant: Instruct Distil (CFG-Distilled, 8-step)
	- Quantization: INT8 per-channel quantization via bitsandbytes
	- Diffusion Steps: 8
	- Default Guidance Scale: 2.5
	- Resolution: Up to 2048x2048
	- Language: English and Chinese prompts

	### Distillation

	This is the CFG-Distilled variant, which means:
	- Only 8 diffusion steps needed (vs 50 for the full Instruct model)
	- ~6x faster image generation
	- No quality loss — distilled to match the full model's output
	- `cfg_distilled: true` in config means no classifier-free guidance needed

	## Quantization Details

	Layers quantized to INT8:
	- Feed-forward networks (FFN/MLP layers)
	- Expert layers in MoE architecture (64 experts per layer)
	- Large linear transformations

	Kept in full precision (BF16):
	- VAE encoder/decoder (critical for image quality)
	- Attention projection layers (q_proj, k_proj, v_proj, o_proj)
	- Patch embedding layers
	- Time embedding layers
	- Vision model (SigLIP2)
	- Final output layers

	## Usage

	### ComfyUI (Recommended)

	This model is designed to work with the [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) custom nodes:

	```bash
	cd ComfyUI/custom_nodes
	git clone https://github.com/EricRollei/Comfy_HunyuanImage3
	```

	1. Download this model to your ComfyUI models directory
	2. Use the "Hunyuan 3 Instruct Loader" node
	3. Select this model folder and choose `int8` precision
	4. Connect to the "Hunyuan 3 Instruct Generate" node for text-to-image
	5. Or use "Hunyuan 3 Instruct Edit" for image editing
	6. Or use "Hunyuan 3 Instruct Multi-Fusion" for combining multiple images

	### Bot Task Modes

	The Instruct model supports three generation modes:

	\| Mode \| Description \| Speed \|
	\|------\|-------------\|-------\|
	\| `image` \| Direct text-to-image, prompt used as-is \| Fastest \|
	\| `recaption` \| Model rewrites prompt into detailed description, then generates \| Medium \|
	\| `think_recaption` \| CoT reasoning → prompt enhancement → generation (best quality) \| Slowest \|

	## Original Model

	This is a quantized derivative of [Tencent's HunyuanImage-3.0 Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil).

	- Architecture: Diffusion Transformer with Mixture-of-Experts
	- Resolution: Up to 2048x2048
	- Language Support: English and Chinese prompts
	- License: [Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

	## Limitations

	- Requires high-end professional GPU (~92-100 GB VRAM)
	- INT8 quantization may introduce minor quality differences in edge cases
	- Loading time adds ~1-2 minutes overhead to first generation
	- CoT/recaption modes require additional time for text generation phase

	## Credits

	- Original Model: [Tencent Hunyuan Team](https://huggingface.co/tencent)
	- Quantization: Eric Rollei
	- ComfyUI Integration: [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3)

	## License

	This model inherits the license from the original Hunyuan Image 3.0 model:
	[Tencent Hunyuan Community License](https://huggingface.co/tencent/HunyuanImage-3.0/blob/main/LICENSE.txt)

	Please review the original license for commercial use restrictions and requirements.

	## Citation

	```bibtex
	@misc{hunyuan-image-3-int8-instruct,
	author = {Rollei, Eric},
	title = {Hunyuan Image 3.0 Instruct Distil — INT8 Quantized},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8}}
	}
	```