Add library_name: transformers metadata (resolves PR #1 intent)

#7
by ThanhNguyxn - opened
Files changed (1) hide show
  1. README.md +574 -573
README.md CHANGED
@@ -1,573 +1,574 @@
1
- ---
2
- license: other
3
- pipeline_tag: image-to-image
4
- ---
5
-
6
-
7
- [中文文档](./README_zh_CN.md)
8
-
9
- <div align="center">
10
-
11
- <img src="./assets/logo.png" alt="HunyuanImage-3.0 Logo" width="600">
12
-
13
- # 🎨 HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
14
-
15
- </div>
16
-
17
-
18
- <div align="center">
19
- <img src="./assets/banner.png" alt="HunyuanImage-3.0 Banner" width="800">
20
-
21
- </div>
22
-
23
- <div align="center">
24
- <a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
25
- <a href=https://huggingface.co/tencent/HunyuanImage-3.0-Instruct target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
26
- <a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
27
- <a href=https://arxiv.org/pdf/2509.23951 target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
28
- <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
29
- <a href=https://docs.qq.com/doc/DUVVadmhCdG9qRXBU target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a>
30
- </div>
31
-
32
-
33
- <p align="center">
34
- 👏 Join our <a href="./assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |
35
- 💻 <a href="https://hunyuan.tencent.com/chat/HunyuanDefault?from=modelSquare&modelId=Hunyuan-Image-3.0-Instruct">Official website(官网) Try our model!</a>&nbsp&nbsp
36
- </p>
37
-
38
- ## 🔥🔥🔥 News
39
-
40
- - **January 26, 2026**: 🚀 **[HunyuanImage-3.0-Instruct-Distil](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil)** - Distilled checkpoint for efficient deployment (8 steps sampling recommended).
41
- - **January 26, 2026**: 🎉 **[HunyuanImage-3.0-Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct)** - Release of **Instruct (with reasoning)** for intelligent prompt enhancement and **Image-to-Image** generation for creative editing.
42
- - **October 30, 2025**: 🚀 **[HunyuanImage-3.0 vLLM Acceleration](./vllm_infer/README.md)** - Significantly faster inference with vLLM support.
43
- - **September 28, 2025**: 📖 **[HunyuanImage-3.0 Technical Report](https://arxiv.org/pdf/2509.23951)** - Comprehensive technical documentation now available.
44
- - **September 28, 2025**: 🎉 **[HunyuanImage-3.0 Open Source](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)** - Inference code and model weights publicly available.
45
-
46
-
47
- ## 🧩 Community Contributions
48
-
49
- If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.
50
-
51
- ## 📑 Open-source Plan
52
-
53
- - HunyuanImage-3.0 (Image Generation Model)
54
- - [x] Inference
55
- - [x] HunyuanImage-3.0 Checkpoints
56
- - [x] HunyuanImage-3.0-Instruct Checkpoints (with reasoning)
57
- - [x] vLLM Support
58
- - [x] Distilled Checkpoints
59
- - [x] Image-to-Image Generation
60
- - [ ] Multi-turn Interaction
61
-
62
-
63
- ## 🗂️ Contents
64
- - [🔥🔥🔥 News](#-news)
65
- - [🧩 Community Contributions](#-community-contributions)
66
- - [📑 Open-source Plan](#-open-source-plan)
67
- - [📖 Introduction](#-introduction)
68
- - [ Key Features](#-key-features)
69
- - [🚀 Usage](#-usage)
70
- - [📦 Environment Setup](#-environment-setup)
71
- - [📥 Install Dependencies](#-install-dependencies)
72
- - [HunyuanImage-3.0-Instruct](#hunyuanimage-30-instruct-instruction-reasoning-and-image-to-image-generation-including-editing-and-multi-image-fusion)
73
- - [🔥 Quick Start with Transformers](#-quick-start-with-transformers)
74
- - [1️⃣ Download model weights](#1-download-model-weights)
75
- - [2️⃣ Run with Transformers](#2-run-with-transformers)
76
- - [🏠 Local Installation & Usage](#-local-installation--usage)
77
- - [1️⃣ Clone the Repository](#1-clone-the-repository)
78
- - [2️⃣ Download Model Weights](#2-download-model-weights)
79
- - [3️⃣ Run the Demo](#3-run-the-demo)
80
- - [4️⃣ Command Line Arguments](#4-command-line-arguments)
81
- - [5️⃣ For fewer Sampling Steps](#5-for-fewer-sampling-steps)
82
- - [HunyuanImage-3.0 (Text-to-image)](#hunyuanimage-30-text-to-image)
83
- - [🔥 Quick Start with Transformers](#-quick-start-with-transformers-1)
84
- - [1️⃣ Download model weights](#1-download-model-weights-1)
85
- - [2️⃣ Run with Transformers](#2-run-with-transformers-1)
86
- - [🏠 Local Installation & Usage](#-local-installation--usage-1)
87
- - [1️⃣ Clone the Repository](#1-clone-the-repository-1)
88
- - [2️⃣ Download Model Weights](#2-download-model-weights-1)
89
- - [3️⃣ Run the Demo](#3-run-the-demo-1)
90
- - [4️⃣ Command Line Arguments](#4-command-line-arguments-1)
91
- - [🎨 Interactive Gradio Demo](#-interactive-gradio-demo)
92
- - [1️⃣ Install Gradio](#1-install-gradio)
93
- - [2️⃣ Configure Environment](#2-configure-environment)
94
- - [3️⃣ Launch the Web Interface](#3-launch-the-web-interface)
95
- - [4️⃣ Access the Interface](#4-access-the-interface)
96
- - [🧱 Models Cards](#-models-cards)
97
- - [📊 Evaluation](#-evaluation)
98
- - [Evaluation of HunyuanImage-3.0-Instruct](#evaluation-of-hunyuanimage-30-instruct)
99
- - [Evaluation of HunyuanImage-3.0 (Text-to-Image)](#evaluation-of-hunyuanimage-30-text-to-image)
100
- - [🖼️ Showcase](#-showcase)
101
- - [Showcases of HunyuanImage-3.0-Instruct](#showcases-of-hunyuanimage-30-instruct)
102
- - [📚 Citation](#-citation)
103
- - [🙏 Acknowledgements](#-acknowledgements)
104
- - [🌟🚀 Github Star History](#-github-star-history)
105
-
106
- ---
107
-
108
- ## 📖 Introduction
109
-
110
- **HunyuanImage-3.0** is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image and image-to-image model achieves performance **comparable to or surpassing** leading closed-source models.
111
-
112
-
113
- <div align="center">
114
- <img src="./assets/framework.png" alt="HunyuanImage-3.0 Framework" width="90%">
115
- </div>
116
-
117
- ## ✨ Key Features
118
-
119
- * 🧠 **Unified Multimodal Architecture:** Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation.
120
-
121
- * 🏆 **The Largest Image Generation MoE Model:** This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.
122
-
123
- * 🎨 **Superior Image Generation Performance:** Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.
124
-
125
- * 💭 **Intelligent Image Understanding and World-Knowledge Reasoning:** The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It under stands user's input image, and leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.
126
-
127
-
128
- ## 🚀 Usage
129
-
130
- ### 📦 Environment Setup
131
-
132
- * 🐍 **Python:** 3.12+ (recommended and tested)
133
- * **CUDA:** 12.8
134
-
135
- #### 📥 Install Dependencies
136
-
137
- ```bash
138
- # 1. First install PyTorch (CUDA 12.8 Version)
139
- pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
140
-
141
- # 2. Install tencentcloud-sdk for Prompt Enhancement (PE) only for HunyuanImage-3.0 not HunyuanImage-3.0-Instruct
142
- pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python
143
-
144
- # 3. Then install other dependencies
145
- pip install -r requirements.txt
146
- ```
147
-
148
- For **up to 3x faster inference**, install these optimizations:
149
-
150
- ```bash
151
- # FlashInfer for optimized moe inference. v0.5.0 is tested.
152
- pip install flashinfer-python==0.5.0
153
- ```
154
- > 💡**Installation Tips:** It is critical that the CUDA version used by PyTorch matches the system's CUDA version.
155
- > FlashInfer relies on this compatibility when compiling kernels at runtime.
156
- > GCC version >=9 is recommended for compiling FlashAttention and FlashInfer.
157
-
158
- > ⚡ **Performance Tips:** These optimizations can significantly speed up your inference!
159
-
160
- > 💡**Notation:** When FlashInfer is enabled, the first inference may be slower (about 10 minutes) due to kernel compilation. Subsequent inferences on the same machine will be much faster.
161
-
162
- ### HunyuanImage-3.0-Instruct (Instruction reasoning and Image-to-image generation, including editing and multi-image fusion)
163
-
164
- #### 🔥 Quick Start with Transformers
165
-
166
- ##### 1️⃣ Download model weights
167
-
168
- ```bash
169
- # Download from HuggingFace and rename the directory.
170
- # Notice that the directory name should not contain dots, which may cause issues when loading using Transformers.
171
- hf download tencent/HunyuanImage-3.0-Instruct --local-dir ./HunyuanImage-3-Instruct
172
- ```
173
-
174
- ##### 2️⃣ Run with Transformers
175
-
176
- ```python
177
- from transformers import AutoModelForCausalLM
178
-
179
- # Load the model
180
- model_id = "./HunyuanImage-3-Instruct"
181
- # Currently we can not load the model using HF model_id `tencent/HunyuanImage-3.0-Instruct` directly
182
- # due to the dot in the name.
183
-
184
- kwargs = dict(
185
- attn_implementation="sdpa",
186
- trust_remote_code=True,
187
- torch_dtype="auto",
188
- device_map="auto",
189
- moe_impl="eager", # Use "flashinfer" if FlashInfer is installed
190
- moe_drop_tokens=True,
191
- )
192
-
193
- model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
194
- model.load_tokenizer(model_id)
195
-
196
- # Image-to-Image generation (TI2I)
197
- prompt = "基于图一的logo,参考图二中冰箱贴的材质,制作一个新的冰箱贴"
198
-
199
- input_img1 = "./assets/demo_instruct_imgs/input_1_0.png"
200
- input_img2 = "./assets/demo_instruct_imgs/input_1_1.png"
201
- imgs_input = [input_img1, input_img2]
202
-
203
- cot_text, samples = model.generate_image(
204
- prompt=prompt,
205
- image=imgs_input,
206
- seed=42,
207
- image_size="auto",
208
- use_system_prompt="en_unified",
209
- bot_task="think_recaption", # Use "think_recaption" for reasoning and enhancement
210
- infer_align_image_size=True, # Align output image size to input image size
211
- diff_infer_steps=50,
212
- verbose=2
213
- )
214
-
215
- # Save the generated image
216
- samples[0].save("image_edit.png")
217
- ```
218
-
219
- #### 🏠 Local Installation & Usage
220
-
221
- ##### 1️⃣ Clone the Repository
222
-
223
- ```bash
224
- git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
225
- cd HunyuanImage-3.0/
226
- ```
227
-
228
- ##### 2️⃣ Download Model Weights
229
-
230
- ```bash
231
- # Download from HuggingFace
232
- hf download tencent/HunyuanImage-3.0-Instruct --local-dir ./HunyuanImage-3-Instruct
233
- ```
234
-
235
- ##### 3️⃣ Run the Demo
236
-
237
- More demos in `run_demo_instruct.sh`.
238
-
239
- ```bash
240
- export MODEL_PATH="./HunyuanImage-3-Instruct"
241
- bash run_demo_instruct.sh
242
- ```
243
-
244
- ##### 4️⃣ Command Line Arguments
245
-
246
- | Arguments | Description | Recommended |
247
- | ----------------------- | ------------------------------------------------------------ | ----------- |
248
- | `--prompt` | Input prompt | (Required) |
249
- | `--image` | Image to run. For multiple images, use comma-separated paths (e.g., 'img1.png,img2.png') | (Required) |
250
- | `--model-id` | Model path | (Required) |
251
- | `--attn-impl` | Attention implementation. Now only support 'sdpa' | `sdpa` |
252
- | `--moe-impl` | MoE implementation. Either `eager` or `flashinfer` | `flashinfer` |
253
- | `--seed` | Random seed for image generation. Use None for random seed | `None` |
254
- | `--diff-infer-steps` | Number of inference steps | `50` |
255
- | `--image-size` | Image resolution. Can be `auto`, like `1280x768` or `16:9` | `auto` |
256
- | `--use-system-prompt` | System prompt type. Options: `None`, `dynamic`, `en_vanilla`, `en_recaption`, `en_think_recaption`, `en_unified`, `custom` | `en_unified` |
257
- | `--system-prompt` | Custom system prompt. Used when `--use-system-prompt` is `custom` | `None` |
258
- | `--bot-task` | Task type. `image` for direct generation; `auto` for text; `recaption` for re-write->image; `think_recaption` for think->re-write->image | `think_recaption` |
259
- | `--save` | Image save path | `image.png` |
260
- | `--verbose` | Verbose level | `2` |
261
- | `--reproduce` | Whether to reproduce the results | `True` |
262
- | `--infer-align-image-size` | Whether to align the target image size to the src image size | `True` |
263
- | `--max_new_tokens` | Maximum number of new tokens to generate | `2048` |
264
- | `--use-taylor-cache` | Use Taylor Cache when sampling | `False` |
265
-
266
- ##### 5️⃣ For fewer Sampling Steps
267
-
268
- We recommend using the model [HunyuanImage-3.0-Instruct-Distil](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil) with `--diff-infer-steps 8`, while keeping all other recommended parameter values **unchanged**.
269
-
270
- ```bash
271
- # Download HunyuanImage-3.0-Instruct-Distil from HuggingFace
272
- hf download tencent/HunyuanImage-3.0-Instruct-Distil --local-dir ./HunyuanImage-3-Instruct-Distil
273
-
274
- # Run the demo with 8 steps to samples
275
- export MODEL_PATH="./HunyuanImage-3-Instruct-Distil"
276
- bash run_demo_instruct_Distil.sh
277
- ```
278
-
279
- <details>
280
- <summary> Previous Version (Pure Text-to-Image) </summary>
281
-
282
- ### HunyuanImage-3.0 (Text-to-image)
283
-
284
- #### 🔥 Quick Start with Transformers
285
-
286
- ##### 1️⃣ Download model weights
287
-
288
- ```bash
289
- # Download from HuggingFace and rename the directory.
290
- # Notice that the directory name should not contain dots, which may cause issues when loading using Transformers.
291
- hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
292
- ```
293
-
294
- ##### 2️⃣ Run with Transformers
295
-
296
- ```python
297
- from transformers import AutoModelForCausalLM
298
-
299
- # Load the model
300
- model_id = "./HunyuanImage-3"
301
- # Currently we can not load the model using HF model_id `tencent/HunyuanImage-3.0` directly
302
- # due to the dot in the name.
303
-
304
- kwargs = dict(
305
- attn_implementation="sdpa", # Use "flash_attention_2" if FlashAttention is installed
306
- trust_remote_code=True,
307
- torch_dtype="auto",
308
- device_map="auto",
309
- moe_impl="eager", # Use "flashinfer" if FlashInfer is installed
310
- )
311
-
312
- model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
313
- model.load_tokenizer(model_id)
314
-
315
- # generate the image
316
- prompt = "A brown and white dog is running on the grass"
317
- image = model.generate_image(prompt=prompt, stream=True)
318
- image.save("image.png")
319
- ```
320
-
321
-
322
- #### 🏠 Local Installation & Usage
323
-
324
- ##### 1️⃣ Clone the Repository
325
-
326
- ```bash
327
- git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
328
- cd HunyuanImage-3.0/
329
- ```
330
-
331
- ##### 2️⃣ Download Model Weights
332
-
333
- ```bash
334
- # Download from HuggingFace
335
- hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
336
- ```
337
-
338
- ##### 3️⃣ Run the Demo
339
- The Pretrain Checkpoint does not automatically rewrite or enhance input prompts, for optimal results currently, we recommend community partners to use deepseek to rewrite the prompts. You can go to [Tencent Cloud](https://cloud.tencent.com/document/product/1772/115963#.E5.BF.AB.E9.80.9F.E6.8E.A5.E5.85.A5) to apply for an API Key.
340
-
341
- ```bash
342
- # Without PE
343
- export MODEL_PATH="./HunyuanImage-3"
344
- python3 run_image_gen.py \
345
- --model-id $MODEL_PATH \
346
- --verbose 1 \
347
- --prompt "A brown and white dog is running on the grass" \
348
- --bot-task image \
349
- --image-size "1024x1024" \
350
- --save ./image.png \
351
- --moe-impl flashinfer
352
-
353
- # With PE
354
- export DEEPSEEK_KEY_ID="your_deepseek_key_id"
355
- export DEEPSEEK_KEY_SECRET="your_deepseek_key_secret"
356
- export MODEL_PATH="./HunyuanImage-3"
357
- python3 run_image_gen.py \
358
- --model-id $MODEL_PATH \
359
- --verbose 1 \
360
- --prompt "A brown and white dog is running on the grass" \
361
- --bot-task image \
362
- --image-size "1024x1024" \
363
- --save ./image.png \
364
- --moe-impl flashinfer \
365
- --rewrite 1
366
-
367
- ```
368
-
369
- ##### 4️⃣ Command Line Arguments
370
-
371
- | Arguments | Description | Recommended |
372
- | ----------------------- | ------------------------------------------------------------ | ----------- |
373
- | `--prompt` | Input prompt | (Required) |
374
- | `--model-id` | Model path | (Required) |
375
- | `--attn-impl` | Attention implementation. Either `sdpa` or `flash_attention_2`. | `sdpa` |
376
- | `--moe-impl` | MoE implementation. Either `eager` or `flashinfer` | `flashinfer` |
377
- | `--seed` | Random seed for image generation | `None` |
378
- | `--diff-infer-steps` | Diffusion infer steps | `50` |
379
- | `--image-size` | Image resolution. Can be `auto`, like `1280x768` or `16:9` | `auto` |
380
- | `--save` | Image save path. | `image.png` |
381
- | `--verbose` | Verbose level. 0: No log; 1: log inference information. | `0` |
382
- | `--rewrite` | Whether to enable rewriting | `1` |
383
-
384
- #### 🎨 Interactive Gradio Demo
385
-
386
- Launch an interactive web interface for easy text-to-image generation.
387
-
388
- ##### 1️⃣ Install Gradio
389
-
390
- ```bash
391
- pip install gradio>=4.21.0
392
- ```
393
-
394
- ##### 2️⃣ Configure Environment
395
-
396
- ```bash
397
- # Set your model path
398
- export MODEL_ID="path/to/your/model"
399
-
400
- # Optional: Configure GPU usage (default: 0,1,2,3)
401
- export GPUS="0,1,2,3"
402
-
403
- # Optional: Configure host and port (default: 0.0.0.0:443)
404
- export HOST="0.0.0.0"
405
- export PORT="443"
406
- ```
407
-
408
- ##### 3️⃣ Launch the Web Interface
409
-
410
- **Basic Launch:**
411
- ```bash
412
- sh run_app.sh
413
- ```
414
-
415
- **With Performance Optimizations:**
416
- ```bash
417
- # Use both optimizations for maximum performance
418
- sh run_app.sh --moe-impl flashinfer --attn-impl flash_attention_2
419
- ```
420
-
421
- ##### 4️⃣ Access the Interface
422
-
423
- > 🌐 **Web Interface:** Open your browser and navigate to `http://localhost:443` (or your configured port)
424
-
425
-
426
- </details>
427
-
428
- ## 🧱 Models Cards
429
-
430
- | Model | Params | Download | Recommended VRAM | Supported |
431
- |---------------------------| --- | --- | --- | --- |
432
- | HunyuanImage-3.0 | 80B total (13B active) | [HuggingFace](https://huggingface.co/tencent/HunyuanImage-3.0) | 3 × 80 GB | ✅ Text-to-Image
433
- | HunyuanImage-3.0-Instruct | 80B total (13B active) | [HuggingFace](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct) | ≥ 8 × 80 GB | ✅ Text-to-Image<br>✅ Text-Image-to-Image<br>✅ Prompt Self-Rewrite <br>✅ CoT Think
434
- | HunyuanImage-3.0-Instruct-Distil | 80B total (13B active) | [HuggingFace](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil) | ≥ 8 × 80 GB |✅ Text-to-Image<br>✅ Text-Image-to-Image<br>✅ Prompt Self-Rewrite <br>✅ CoT Think <br>✅ Fewer sampling steps (8 steps recommended)
435
-
436
- Notes:
437
- - Install performance extras (FlashAttention, FlashInfer) for faster inference.
438
- - Multi‑GPU inference is recommended for the Base model.
439
-
440
-
441
- ## 📊 Evaluation
442
-
443
- ### Evaluation of HunyuanImage-3.0-Instruct
444
- * 👥 **GSB (Human Evaluation)**
445
- We adopted the GSB (Good/Same/Bad) evaluation method commonly used to assess the relative performance between two models from an overall image perception perspective. In total, we utilized 1,000+ single- and multi-images editing cases, generating an equal number of image samples for all compared models in a single run. For a fair comparison, we conducted inference only once for each prompt, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models. The evaluation was performed by more than 100 professional evaluators.
446
-
447
- <p align="center">
448
- <img src="./assets/gsb_instruct.png" width=60% alt="Human Evaluation with Other Models">
449
- </p>
450
-
451
-
452
- ### Evaluation of HunyuanImage-3.0 (Text-to-Image)
453
-
454
- * 🤖 **SSAE (Machine Evaluation)**
455
- SSAE (Structured Semantic Alignment Evaluation) is an intelligent evaluation metric for image-text alignment based on advanced multimodal large language models (MLLMs). We extracted 3500 key points across 12 categories, then used multimodal large language models to automatically evaluate and score by comparing the generated images with these key points based on the visual content of the images. Mean Image Accuracy represents the image-wise average score across all key points, while Global Accuracy directly calculates the average score across all key points.
456
-
457
- <p align="center">
458
- <img src="./assets/ssae_side_by_side_comparison.png" width=98% alt="Human Evaluation with Other Models">
459
- </p>
460
-
461
- <p align="center">
462
- <img src="./assets/ssae_side_by_side_heatmap.png" width=98% alt="Human Evaluation with Other Models">
463
- </p>
464
-
465
-
466
- * 👥 **GSB (Human Evaluation)**
467
-
468
- We adopted the GSB (Good/Same/Bad) evaluation method commonly used to assess the relative performance between two models from an overall image perception perspective. In total, we utilized 1,000 text prompts, generating an equal number of image samples for all compared models in a single run. For a fair comparison, we conducted inference only once for each prompt, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models. The evaluation was performed by more than 100 professional evaluators.
469
-
470
- <p align="center">
471
- <img src="./assets/gsb.png" width=98% alt="Human Evaluation with Other Models">
472
- </p>
473
-
474
- ## 🖼️ Showcase
475
-
476
- Our model can follow complex instructions to generate high‑quality, creative images.
477
-
478
- <div align="center">
479
- <img src="./assets/banner_all.jpg" width=100% alt="HunyuanImage 3.0 Demo">
480
- </div>
481
-
482
- For text-to-image showcases in HunyuanImage-3.0, click the following links:
483
-
484
- - [HunyuanImage-3.0](./Hunyuan-Image3.md)
485
-
486
- ### Showcases of HunyuanImage-3.0-Instruct
487
-
488
- HunyuanImage-3.0-Instruct demonstrates powerful capabilities in intelligent image generation and editing. The following showcases highlight its core features:
489
-
490
- * 🧠 **Intelligent Visual Understanding and Reasoning (CoT Think)**: The model performs structured thinking to analyze user's input image and prompt, expand user's intent and editing tasks into a stucture, comprehnsive instructions, and leading to a better image generation and editing performance.
491
-
492
- breaking down complex prompts and editing tasks into detailed visual components including subject, composition, lighting, color palette, and style.
493
-
494
- * ✏️ **Prompt Self-Rewrite**: Automatically enhances sparse or vague prompts into professional-grade, detail-rich descriptions that capture the user's intent more accurately.
495
-
496
- * 🎨 **Text-to-Image (T2I)**: Generates high-quality images from text prompts with exceptional prompt adherence and photorealistic quality.
497
-
498
- * 🖼️ **Image-to-Image (TI2I)**: Supports creative image editing, including adding elements, removing objects, modifying styles, and seamless background replacement while preserving key visual elements.
499
-
500
- * 🔀 **Multi-Image Fusion**: Intelligently combines multiple reference images (up to 3 inputs) to create coherent composite images that integrate visual elements from different sources.
501
-
502
-
503
- **Showcase 1: Detailed Thought and Reasoning Process**
504
-
505
- <div align="center">
506
- <img src="./assets/pg_instruct_imgs/cot_ti2i.gif" alt="HunyuanImage-3.0-Instruct Showcase 1" width="90%">
507
- </div>
508
-
509
- **Showcase 2: Creative T2I Generation with Complex Scene Understanding**
510
-
511
- > Prompt: 3D 毛绒质感拟人化马,暖棕浅棕肌理,穿藏蓝西装、白衬衫,戴深棕手套;疲惫带期待,坐于电脑前,旁置印 "HAPPY AGAIN" 的马克杯。橙红渐变背景,配超大号藏蓝粗体 "马上下班",叠加米黄 "Happy New Year" 并标 "(2026)"。橙红为主,藏蓝米黄撞色,毛绒温暖柔和。
512
-
513
- <div align="center">
514
- <img src="./assets/pg_instruct_imgs/image0.png" alt="HunyuanImage-3.0-Instruct Showcase 2" width="75%">
515
- </div>
516
-
517
- **Showcase 3: Precise Image Editing with Element Preservation**
518
-
519
- <div align="center">
520
- <img src="./assets/pg_instruct_imgs/image1.png" alt="HunyuanImage-3.0-Instruct Showcase 3" width="85%">
521
- </div>
522
-
523
- **Showcase 4: Style Transformation with Thematic Enhancement**
524
-
525
- <div align="center">
526
- <img src="./assets/pg_instruct_imgs/image2.png" alt="HunyuanImage-3.0-Instruct Showcase 4" width="85%">
527
- </div>
528
-
529
-
530
- **Showcase 5: Advanced Style Transfer and Product Mockup Generation**
531
-
532
- <div align="center">
533
- <img src="./assets/pg_instruct_imgs/image3.png" alt="HunyuanImage-3.0-Instruct Showcase 5" width="85%">
534
- </div>
535
-
536
-
537
- **Showcase 6: Multi-Image Fusion and Creative Composition**
538
-
539
- <div align="center">
540
- <img src="./assets/pg_instruct_imgs/image4.png" alt="HunyuanImage-3.0-Instruct Showcase 6" width="85%">
541
- </div>
542
-
543
-
544
- ## 📚 Citation
545
-
546
- If you find HunyuanImage-3.0 useful in your research, please cite our work:
547
-
548
- ```bibtex
549
- @article{cao2025hunyuanimage,
550
- title={HunyuanImage 3.0 Technical Report},
551
- author={Cao, Siyu and Chen, Hangting and Chen, Peng and Cheng, Yiji and Cui, Yutao and Deng, Xinchi and Dong, Ying and Gong, Kipper and Gu, Tianpeng and Gu, Xiusen and others},
552
- journal={arXiv preprint arXiv:2509.23951},
553
- year={2025}
554
- }
555
- ```
556
-
557
- ## 🙏 Acknowledgements
558
-
559
- We extend our heartfelt gratitude to the following open-source projects and communities for their invaluable contributions:
560
-
561
- * 🤗 [Transformers](https://github.com/huggingface/transformers) - State-of-the-art NLP library
562
- * 🎨 [Diffusers](https://github.com/huggingface/diffusers) - Diffusion models library
563
- * 🌐 [HuggingFace](https://huggingface.co/) - AI model hub and community
564
- * [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention
565
- * 🚀 [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine
566
-
567
- ## 🌟🚀 Github Star History
568
-
569
- [![GitHub stars](https://img.shields.io/github/stars/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
570
- [![GitHub forks](https://img.shields.io/github/forks/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
571
-
572
-
573
- [![Star History Chart](https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanImage-3.0&type=Date)](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date)
 
 
1
+ ---
2
+ license: other
3
+ pipeline_tag: image-to-image
4
+ library_name: transformers
5
+ ---
6
+
7
+
8
+ [中文文档](./README_zh_CN.md)
9
+
10
+ <div align="center">
11
+
12
+ <img src="./assets/logo.png" alt="HunyuanImage-3.0 Logo" width="600">
13
+
14
+ # 🎨 HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
15
+
16
+ </div>
17
+
18
+
19
+ <div align="center">
20
+ <img src="./assets/banner.png" alt="HunyuanImage-3.0 Banner" width="800">
21
+
22
+ </div>
23
+
24
+ <div align="center">
25
+ <a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
26
+ <a href=https://huggingface.co/tencent/HunyuanImage-3.0-Instruct target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
27
+ <a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
28
+ <a href=https://arxiv.org/pdf/2509.23951 target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
29
+ <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
30
+ <a href=https://docs.qq.com/doc/DUVVadmhCdG9qRXBU target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a>
31
+ </div>
32
+
33
+
34
+ <p align="center">
35
+ 👏 Join our <a href="./assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |
36
+ 💻 <a href="https://hunyuan.tencent.com/chat/HunyuanDefault?from=modelSquare&modelId=Hunyuan-Image-3.0-Instruct">Official website(官网) Try our model!</a>&nbsp&nbsp
37
+ </p>
38
+
39
+ ## 🔥🔥🔥 News
40
+
41
+ - **January 26, 2026**: 🚀 **[HunyuanImage-3.0-Instruct-Distil](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil)** - Distilled checkpoint for efficient deployment (8 steps sampling recommended).
42
+ - **January 26, 2026**: 🎉 **[HunyuanImage-3.0-Instruct](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct)** - Release of **Instruct (with reasoning)** for intelligent prompt enhancement and **Image-to-Image** generation for creative editing.
43
+ - **October 30, 2025**: 🚀 **[HunyuanImage-3.0 vLLM Acceleration](./vllm_infer/README.md)** - Significantly faster inference with vLLM support.
44
+ - **September 28, 2025**: 📖 **[HunyuanImage-3.0 Technical Report](https://arxiv.org/pdf/2509.23951)** - Comprehensive technical documentation now available.
45
+ - **September 28, 2025**: 🎉 **[HunyuanImage-3.0 Open Source](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)** - Inference code and model weights publicly available.
46
+
47
+
48
+ ## 🧩 Community Contributions
49
+
50
+ If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.
51
+
52
+ ## 📑 Open-source Plan
53
+
54
+ - HunyuanImage-3.0 (Image Generation Model)
55
+ - [x] Inference
56
+ - [x] HunyuanImage-3.0 Checkpoints
57
+ - [x] HunyuanImage-3.0-Instruct Checkpoints (with reasoning)
58
+ - [x] vLLM Support
59
+ - [x] Distilled Checkpoints
60
+ - [x] Image-to-Image Generation
61
+ - [ ] Multi-turn Interaction
62
+
63
+
64
+ ## 🗂️ Contents
65
+ - [🔥🔥🔥 News](#-news)
66
+ - [🧩 Community Contributions](#-community-contributions)
67
+ - [📑 Open-source Plan](#-open-source-plan)
68
+ - [📖 Introduction](#-introduction)
69
+ - [ Key Features](#-key-features)
70
+ - [🚀 Usage](#-usage)
71
+ - [📦 Environment Setup](#-environment-setup)
72
+ - [📥 Install Dependencies](#-install-dependencies)
73
+ - [HunyuanImage-3.0-Instruct](#hunyuanimage-30-instruct-instruction-reasoning-and-image-to-image-generation-including-editing-and-multi-image-fusion)
74
+ - [🔥 Quick Start with Transformers](#-quick-start-with-transformers)
75
+ - [1️⃣ Download model weights](#1-download-model-weights)
76
+ - [2️⃣ Run with Transformers](#2-run-with-transformers)
77
+ - [🏠 Local Installation & Usage](#-local-installation--usage)
78
+ - [1️⃣ Clone the Repository](#1-clone-the-repository)
79
+ - [2️⃣ Download Model Weights](#2-download-model-weights)
80
+ - [3️⃣ Run the Demo](#3-run-the-demo)
81
+ - [4️⃣ Command Line Arguments](#4-command-line-arguments)
82
+ - [5️⃣ For fewer Sampling Steps](#5-for-fewer-sampling-steps)
83
+ - [HunyuanImage-3.0 (Text-to-image)](#hunyuanimage-30-text-to-image)
84
+ - [🔥 Quick Start with Transformers](#-quick-start-with-transformers-1)
85
+ - [1️⃣ Download model weights](#1-download-model-weights-1)
86
+ - [2️⃣ Run with Transformers](#2-run-with-transformers-1)
87
+ - [🏠 Local Installation & Usage](#-local-installation--usage-1)
88
+ - [1️⃣ Clone the Repository](#1-clone-the-repository-1)
89
+ - [2️⃣ Download Model Weights](#2-download-model-weights-1)
90
+ - [3️⃣ Run the Demo](#3-run-the-demo-1)
91
+ - [4️⃣ Command Line Arguments](#4-command-line-arguments-1)
92
+ - [🎨 Interactive Gradio Demo](#-interactive-gradio-demo)
93
+ - [1️⃣ Install Gradio](#1-install-gradio)
94
+ - [2️⃣ Configure Environment](#2-configure-environment)
95
+ - [3️⃣ Launch the Web Interface](#3-launch-the-web-interface)
96
+ - [4️⃣ Access the Interface](#4-access-the-interface)
97
+ - [🧱 Models Cards](#-models-cards)
98
+ - [📊 Evaluation](#-evaluation)
99
+ - [Evaluation of HunyuanImage-3.0-Instruct](#evaluation-of-hunyuanimage-30-instruct)
100
+ - [Evaluation of HunyuanImage-3.0 (Text-to-Image)](#evaluation-of-hunyuanimage-30-text-to-image)
101
+ - [🖼️ Showcase](#-showcase)
102
+ - [Showcases of HunyuanImage-3.0-Instruct](#showcases-of-hunyuanimage-30-instruct)
103
+ - [📚 Citation](#-citation)
104
+ - [🙏 Acknowledgements](#-acknowledgements)
105
+ - [🌟🚀 Github Star History](#-github-star-history)
106
+
107
+ ---
108
+
109
+ ## 📖 Introduction
110
+
111
+ **HunyuanImage-3.0** is a groundbreaking native multimodal model that unifies multimodal understanding and generation within an autoregressive framework. Our text-to-image and image-to-image model achieves performance **comparable to or surpassing** leading closed-source models.
112
+
113
+
114
+ <div align="center">
115
+ <img src="./assets/framework.png" alt="HunyuanImage-3.0 Framework" width="90%">
116
+ </div>
117
+
118
+ ## ✨ Key Features
119
+
120
+ * 🧠 **Unified Multimodal Architecture:** Moving beyond the prevalent DiT-based architectures, HunyuanImage-3.0 employs a unified autoregressive framework. This design enables a more direct and integrated modeling of text and image modalities, leading to surprisingly effective and contextually rich image generation.
121
+
122
+ * 🏆 **The Largest Image Generation MoE Model:** This is the largest open-source image generation Mixture of Experts (MoE) model to date. It features 64 experts and a total of 80 billion parameters, with 13 billion activated per token, significantly enhancing its capacity and performance.
123
+
124
+ * 🎨 **Superior Image Generation Performance:** Through rigorous dataset curation and advanced reinforcement learning post-training, we've achieved an optimal balance between semantic accuracy and visual excellence. The model demonstrates exceptional prompt adherence while delivering photorealistic imagery with stunning aesthetic quality and fine-grained details.
125
+
126
+ * 💭 **Intelligent Image Understanding and World-Knowledge Reasoning:** The unified multimodal architecture endows HunyuanImage-3.0 with powerful reasoning capabilities. It under stands user's input image, and leverages its extensive world knowledge to intelligently interpret user intent, automatically elaborating on sparse prompts with contextually appropriate details to produce superior, more complete visual outputs.
127
+
128
+
129
+ ## 🚀 Usage
130
+
131
+ ### 📦 Environment Setup
132
+
133
+ * 🐍 **Python:** 3.12+ (recommended and tested)
134
+ * ⚡ **CUDA:** 12.8
135
+
136
+ #### 📥 Install Dependencies
137
+
138
+ ```bash
139
+ # 1. First install PyTorch (CUDA 12.8 Version)
140
+ pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
141
+
142
+ # 2. Install tencentcloud-sdk for Prompt Enhancement (PE) only for HunyuanImage-3.0 not HunyuanImage-3.0-Instruct
143
+ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python
144
+
145
+ # 3. Then install other dependencies
146
+ pip install -r requirements.txt
147
+ ```
148
+
149
+ For **up to 3x faster inference**, install these optimizations:
150
+
151
+ ```bash
152
+ # FlashInfer for optimized moe inference. v0.5.0 is tested.
153
+ pip install flashinfer-python==0.5.0
154
+ ```
155
+ > 💡**Installation Tips:** It is critical that the CUDA version used by PyTorch matches the system's CUDA version.
156
+ > FlashInfer relies on this compatibility when compiling kernels at runtime.
157
+ > GCC version >=9 is recommended for compiling FlashAttention and FlashInfer.
158
+
159
+ > ⚡ **Performance Tips:** These optimizations can significantly speed up your inference!
160
+
161
+ > 💡**Notation:** When FlashInfer is enabled, the first inference may be slower (about 10 minutes) due to kernel compilation. Subsequent inferences on the same machine will be much faster.
162
+
163
+ ### HunyuanImage-3.0-Instruct (Instruction reasoning and Image-to-image generation, including editing and multi-image fusion)
164
+
165
+ #### 🔥 Quick Start with Transformers
166
+
167
+ ##### 1️⃣ Download model weights
168
+
169
+ ```bash
170
+ # Download from HuggingFace and rename the directory.
171
+ # Notice that the directory name should not contain dots, which may cause issues when loading using Transformers.
172
+ hf download tencent/HunyuanImage-3.0-Instruct --local-dir ./HunyuanImage-3-Instruct
173
+ ```
174
+
175
+ ##### 2️⃣ Run with Transformers
176
+
177
+ ```python
178
+ from transformers import AutoModelForCausalLM
179
+
180
+ # Load the model
181
+ model_id = "./HunyuanImage-3-Instruct"
182
+ # Currently we can not load the model using HF model_id `tencent/HunyuanImage-3.0-Instruct` directly
183
+ # due to the dot in the name.
184
+
185
+ kwargs = dict(
186
+ attn_implementation="sdpa",
187
+ trust_remote_code=True,
188
+ torch_dtype="auto",
189
+ device_map="auto",
190
+ moe_impl="eager", # Use "flashinfer" if FlashInfer is installed
191
+ moe_drop_tokens=True,
192
+ )
193
+
194
+ model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
195
+ model.load_tokenizer(model_id)
196
+
197
+ # Image-to-Image generation (TI2I)
198
+ prompt = "基于图一的logo,参考图二中冰箱贴的材质,制作一个新的冰箱贴"
199
+
200
+ input_img1 = "./assets/demo_instruct_imgs/input_1_0.png"
201
+ input_img2 = "./assets/demo_instruct_imgs/input_1_1.png"
202
+ imgs_input = [input_img1, input_img2]
203
+
204
+ cot_text, samples = model.generate_image(
205
+ prompt=prompt,
206
+ image=imgs_input,
207
+ seed=42,
208
+ image_size="auto",
209
+ use_system_prompt="en_unified",
210
+ bot_task="think_recaption", # Use "think_recaption" for reasoning and enhancement
211
+ infer_align_image_size=True, # Align output image size to input image size
212
+ diff_infer_steps=50,
213
+ verbose=2
214
+ )
215
+
216
+ # Save the generated image
217
+ samples[0].save("image_edit.png")
218
+ ```
219
+
220
+ #### 🏠 Local Installation & Usage
221
+
222
+ ##### 1️⃣ Clone the Repository
223
+
224
+ ```bash
225
+ git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
226
+ cd HunyuanImage-3.0/
227
+ ```
228
+
229
+ ##### 2️⃣ Download Model Weights
230
+
231
+ ```bash
232
+ # Download from HuggingFace
233
+ hf download tencent/HunyuanImage-3.0-Instruct --local-dir ./HunyuanImage-3-Instruct
234
+ ```
235
+
236
+ ##### 3️⃣ Run the Demo
237
+
238
+ More demos in `run_demo_instruct.sh`.
239
+
240
+ ```bash
241
+ export MODEL_PATH="./HunyuanImage-3-Instruct"
242
+ bash run_demo_instruct.sh
243
+ ```
244
+
245
+ ##### 4️⃣ Command Line Arguments
246
+
247
+ | Arguments | Description | Recommended |
248
+ | ----------------------- | ------------------------------------------------------------ | ----------- |
249
+ | `--prompt` | Input prompt | (Required) |
250
+ | `--image` | Image to run. For multiple images, use comma-separated paths (e.g., 'img1.png,img2.png') | (Required) |
251
+ | `--model-id` | Model path | (Required) |
252
+ | `--attn-impl` | Attention implementation. Now only support 'sdpa' | `sdpa` |
253
+ | `--moe-impl` | MoE implementation. Either `eager` or `flashinfer` | `flashinfer` |
254
+ | `--seed` | Random seed for image generation. Use None for random seed | `None` |
255
+ | `--diff-infer-steps` | Number of inference steps | `50` |
256
+ | `--image-size` | Image resolution. Can be `auto`, like `1280x768` or `16:9` | `auto` |
257
+ | `--use-system-prompt` | System prompt type. Options: `None`, `dynamic`, `en_vanilla`, `en_recaption`, `en_think_recaption`, `en_unified`, `custom` | `en_unified` |
258
+ | `--system-prompt` | Custom system prompt. Used when `--use-system-prompt` is `custom` | `None` |
259
+ | `--bot-task` | Task type. `image` for direct generation; `auto` for text; `recaption` for re-write->image; `think_recaption` for think->re-write->image | `think_recaption` |
260
+ | `--save` | Image save path | `image.png` |
261
+ | `--verbose` | Verbose level | `2` |
262
+ | `--reproduce` | Whether to reproduce the results | `True` |
263
+ | `--infer-align-image-size` | Whether to align the target image size to the src image size | `True` |
264
+ | `--max_new_tokens` | Maximum number of new tokens to generate | `2048` |
265
+ | `--use-taylor-cache` | Use Taylor Cache when sampling | `False` |
266
+
267
+ ##### 5️⃣ For fewer Sampling Steps
268
+
269
+ We recommend using the model [HunyuanImage-3.0-Instruct-Distil](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil) with `--diff-infer-steps 8`, while keeping all other recommended parameter values **unchanged**.
270
+
271
+ ```bash
272
+ # Download HunyuanImage-3.0-Instruct-Distil from HuggingFace
273
+ hf download tencent/HunyuanImage-3.0-Instruct-Distil --local-dir ./HunyuanImage-3-Instruct-Distil
274
+
275
+ # Run the demo with 8 steps to samples
276
+ export MODEL_PATH="./HunyuanImage-3-Instruct-Distil"
277
+ bash run_demo_instruct_Distil.sh
278
+ ```
279
+
280
+ <details>
281
+ <summary> Previous Version (Pure Text-to-Image) </summary>
282
+
283
+ ### HunyuanImage-3.0 (Text-to-image)
284
+
285
+ #### 🔥 Quick Start with Transformers
286
+
287
+ ##### 1️⃣ Download model weights
288
+
289
+ ```bash
290
+ # Download from HuggingFace and rename the directory.
291
+ # Notice that the directory name should not contain dots, which may cause issues when loading using Transformers.
292
+ hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
293
+ ```
294
+
295
+ ##### 2️⃣ Run with Transformers
296
+
297
+ ```python
298
+ from transformers import AutoModelForCausalLM
299
+
300
+ # Load the model
301
+ model_id = "./HunyuanImage-3"
302
+ # Currently we can not load the model using HF model_id `tencent/HunyuanImage-3.0` directly
303
+ # due to the dot in the name.
304
+
305
+ kwargs = dict(
306
+ attn_implementation="sdpa", # Use "flash_attention_2" if FlashAttention is installed
307
+ trust_remote_code=True,
308
+ torch_dtype="auto",
309
+ device_map="auto",
310
+ moe_impl="eager", # Use "flashinfer" if FlashInfer is installed
311
+ )
312
+
313
+ model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
314
+ model.load_tokenizer(model_id)
315
+
316
+ # generate the image
317
+ prompt = "A brown and white dog is running on the grass"
318
+ image = model.generate_image(prompt=prompt, stream=True)
319
+ image.save("image.png")
320
+ ```
321
+
322
+
323
+ #### 🏠 Local Installation & Usage
324
+
325
+ ##### 1️⃣ Clone the Repository
326
+
327
+ ```bash
328
+ git clone https://github.com/Tencent-Hunyuan/HunyuanImage-3.0.git
329
+ cd HunyuanImage-3.0/
330
+ ```
331
+
332
+ ##### 2️⃣ Download Model Weights
333
+
334
+ ```bash
335
+ # Download from HuggingFace
336
+ hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
337
+ ```
338
+
339
+ ##### 3️⃣ Run the Demo
340
+ The Pretrain Checkpoint does not automatically rewrite or enhance input prompts, for optimal results currently, we recommend community partners to use deepseek to rewrite the prompts. You can go to [Tencent Cloud](https://cloud.tencent.com/document/product/1772/115963#.E5.BF.AB.E9.80.9F.E6.8E.A5.E5.85.A5) to apply for an API Key.
341
+
342
+ ```bash
343
+ # Without PE
344
+ export MODEL_PATH="./HunyuanImage-3"
345
+ python3 run_image_gen.py \
346
+ --model-id $MODEL_PATH \
347
+ --verbose 1 \
348
+ --prompt "A brown and white dog is running on the grass" \
349
+ --bot-task image \
350
+ --image-size "1024x1024" \
351
+ --save ./image.png \
352
+ --moe-impl flashinfer
353
+
354
+ # With PE
355
+ export DEEPSEEK_KEY_ID="your_deepseek_key_id"
356
+ export DEEPSEEK_KEY_SECRET="your_deepseek_key_secret"
357
+ export MODEL_PATH="./HunyuanImage-3"
358
+ python3 run_image_gen.py \
359
+ --model-id $MODEL_PATH \
360
+ --verbose 1 \
361
+ --prompt "A brown and white dog is running on the grass" \
362
+ --bot-task image \
363
+ --image-size "1024x1024" \
364
+ --save ./image.png \
365
+ --moe-impl flashinfer \
366
+ --rewrite 1
367
+
368
+ ```
369
+
370
+ ##### 4️⃣ Command Line Arguments
371
+
372
+ | Arguments | Description | Recommended |
373
+ | ----------------------- | ------------------------------------------------------------ | ----------- |
374
+ | `--prompt` | Input prompt | (Required) |
375
+ | `--model-id` | Model path | (Required) |
376
+ | `--attn-impl` | Attention implementation. Either `sdpa` or `flash_attention_2`. | `sdpa` |
377
+ | `--moe-impl` | MoE implementation. Either `eager` or `flashinfer` | `flashinfer` |
378
+ | `--seed` | Random seed for image generation | `None` |
379
+ | `--diff-infer-steps` | Diffusion infer steps | `50` |
380
+ | `--image-size` | Image resolution. Can be `auto`, like `1280x768` or `16:9` | `auto` |
381
+ | `--save` | Image save path. | `image.png` |
382
+ | `--verbose` | Verbose level. 0: No log; 1: log inference information. | `0` |
383
+ | `--rewrite` | Whether to enable rewriting | `1` |
384
+
385
+ #### 🎨 Interactive Gradio Demo
386
+
387
+ Launch an interactive web interface for easy text-to-image generation.
388
+
389
+ ##### 1️⃣ Install Gradio
390
+
391
+ ```bash
392
+ pip install gradio>=4.21.0
393
+ ```
394
+
395
+ ##### 2️⃣ Configure Environment
396
+
397
+ ```bash
398
+ # Set your model path
399
+ export MODEL_ID="path/to/your/model"
400
+
401
+ # Optional: Configure GPU usage (default: 0,1,2,3)
402
+ export GPUS="0,1,2,3"
403
+
404
+ # Optional: Configure host and port (default: 0.0.0.0:443)
405
+ export HOST="0.0.0.0"
406
+ export PORT="443"
407
+ ```
408
+
409
+ ##### 3️⃣ Launch the Web Interface
410
+
411
+ **Basic Launch:**
412
+ ```bash
413
+ sh run_app.sh
414
+ ```
415
+
416
+ **With Performance Optimizations:**
417
+ ```bash
418
+ # Use both optimizations for maximum performance
419
+ sh run_app.sh --moe-impl flashinfer --attn-impl flash_attention_2
420
+ ```
421
+
422
+ ##### 4️⃣ Access the Interface
423
+
424
+ > 🌐 **Web Interface:** Open your browser and navigate to `http://localhost:443` (or your configured port)
425
+
426
+
427
+ </details>
428
+
429
+ ## 🧱 Models Cards
430
+
431
+ | Model | Params | Download | Recommended VRAM | Supported |
432
+ |---------------------------| --- | --- | --- | --- |
433
+ | HunyuanImage-3.0 | 80B total (13B active) | [HuggingFace](https://huggingface.co/tencent/HunyuanImage-3.0) | ≥ 3 × 80 GB | ✅ Text-to-Image
434
+ | HunyuanImage-3.0-Instruct | 80B total (13B active) | [HuggingFace](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct) | ≥ 8 × 80 GB | Text-to-Image<br>✅ Text-Image-to-Image<br>✅ Prompt Self-Rewrite <br>✅ CoT Think
435
+ | HunyuanImage-3.0-Instruct-Distil | 80B total (13B active) | [HuggingFace](https://huggingface.co/tencent/HunyuanImage-3.0-Instruct-Distil) | ≥ 8 × 80 GB |✅ Text-to-Image<br>✅ Text-Image-to-Image<br>✅ Prompt Self-Rewrite <br>✅ CoT Think <br>✅ Fewer sampling steps (8 steps recommended)
436
+
437
+ Notes:
438
+ - Install performance extras (FlashAttention, FlashInfer) for faster inference.
439
+ - Multi‑GPU inference is recommended for the Base model.
440
+
441
+
442
+ ## 📊 Evaluation
443
+
444
+ ### Evaluation of HunyuanImage-3.0-Instruct
445
+ * 👥 **GSB (Human Evaluation)**
446
+ We adopted the GSB (Good/Same/Bad) evaluation method commonly used to assess the relative performance between two models from an overall image perception perspective. In total, we utilized 1,000+ single- and multi-images editing cases, generating an equal number of image samples for all compared models in a single run. For a fair comparison, we conducted inference only once for each prompt, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models. The evaluation was performed by more than 100 professional evaluators.
447
+
448
+ <p align="center">
449
+ <img src="./assets/gsb_instruct.png" width=60% alt="Human Evaluation with Other Models">
450
+ </p>
451
+
452
+
453
+ ### Evaluation of HunyuanImage-3.0 (Text-to-Image)
454
+
455
+ * 🤖 **SSAE (Machine Evaluation)**
456
+ SSAE (Structured Semantic Alignment Evaluation) is an intelligent evaluation metric for image-text alignment based on advanced multimodal large language models (MLLMs). We extracted 3500 key points across 12 categories, then used multimodal large language models to automatically evaluate and score by comparing the generated images with these key points based on the visual content of the images. Mean Image Accuracy represents the image-wise average score across all key points, while Global Accuracy directly calculates the average score across all key points.
457
+
458
+ <p align="center">
459
+ <img src="./assets/ssae_side_by_side_comparison.png" width=98% alt="Human Evaluation with Other Models">
460
+ </p>
461
+
462
+ <p align="center">
463
+ <img src="./assets/ssae_side_by_side_heatmap.png" width=98% alt="Human Evaluation with Other Models">
464
+ </p>
465
+
466
+
467
+ * 👥 **GSB (Human Evaluation)**
468
+
469
+ We adopted the GSB (Good/Same/Bad) evaluation method commonly used to assess the relative performance between two models from an overall image perception perspective. In total, we utilized 1,000 text prompts, generating an equal number of image samples for all compared models in a single run. For a fair comparison, we conducted inference only once for each prompt, avoiding any cherry-picking of results. When comparing with the baseline methods, we maintained the default settings for all selected models. The evaluation was performed by more than 100 professional evaluators.
470
+
471
+ <p align="center">
472
+ <img src="./assets/gsb.png" width=98% alt="Human Evaluation with Other Models">
473
+ </p>
474
+
475
+ ## 🖼️ Showcase
476
+
477
+ Our model can follow complex instructions to generate high‑quality, creative images.
478
+
479
+ <div align="center">
480
+ <img src="./assets/banner_all.jpg" width=100% alt="HunyuanImage 3.0 Demo">
481
+ </div>
482
+
483
+ For text-to-image showcases in HunyuanImage-3.0, click the following links:
484
+
485
+ - [HunyuanImage-3.0](./Hunyuan-Image3.md)
486
+
487
+ ### Showcases of HunyuanImage-3.0-Instruct
488
+
489
+ HunyuanImage-3.0-Instruct demonstrates powerful capabilities in intelligent image generation and editing. The following showcases highlight its core features:
490
+
491
+ * 🧠 **Intelligent Visual Understanding and Reasoning (CoT Think)**: The model performs structured thinking to analyze user's input image and prompt, expand user's intent and editing tasks into a stucture, comprehnsive instructions, and leading to a better image generation and editing performance.
492
+
493
+ breaking down complex prompts and editing tasks into detailed visual components including subject, composition, lighting, color palette, and style.
494
+
495
+ * ✏️ **Prompt Self-Rewrite**: Automatically enhances sparse or vague prompts into professional-grade, detail-rich descriptions that capture the user's intent more accurately.
496
+
497
+ * 🎨 **Text-to-Image (T2I)**: Generates high-quality images from text prompts with exceptional prompt adherence and photorealistic quality.
498
+
499
+ * 🖼️ **Image-to-Image (TI2I)**: Supports creative image editing, including adding elements, removing objects, modifying styles, and seamless background replacement while preserving key visual elements.
500
+
501
+ * 🔀 **Multi-Image Fusion**: Intelligently combines multiple reference images (up to 3 inputs) to create coherent composite images that integrate visual elements from different sources.
502
+
503
+
504
+ **Showcase 1: Detailed Thought and Reasoning Process**
505
+
506
+ <div align="center">
507
+ <img src="./assets/pg_instruct_imgs/cot_ti2i.gif" alt="HunyuanImage-3.0-Instruct Showcase 1" width="90%">
508
+ </div>
509
+
510
+ **Showcase 2: Creative T2I Generation with Complex Scene Understanding**
511
+
512
+ > Prompt: 3D 毛绒质感拟人化马,暖棕浅棕肌理,穿藏蓝西装、白衬衫,戴深棕手套;疲惫带期待,坐于电脑前,旁置印 "HAPPY AGAIN" 的马克杯。橙红渐变背景,配超大号藏蓝粗体 "马上下班",叠加米黄 "Happy New Year" 并标 "(2026)"。橙红为主,藏蓝米黄撞色,毛绒温暖柔和。
513
+
514
+ <div align="center">
515
+ <img src="./assets/pg_instruct_imgs/image0.png" alt="HunyuanImage-3.0-Instruct Showcase 2" width="75%">
516
+ </div>
517
+
518
+ **Showcase 3: Precise Image Editing with Element Preservation**
519
+
520
+ <div align="center">
521
+ <img src="./assets/pg_instruct_imgs/image1.png" alt="HunyuanImage-3.0-Instruct Showcase 3" width="85%">
522
+ </div>
523
+
524
+ **Showcase 4: Style Transformation with Thematic Enhancement**
525
+
526
+ <div align="center">
527
+ <img src="./assets/pg_instruct_imgs/image2.png" alt="HunyuanImage-3.0-Instruct Showcase 4" width="85%">
528
+ </div>
529
+
530
+
531
+ **Showcase 5: Advanced Style Transfer and Product Mockup Generation**
532
+
533
+ <div align="center">
534
+ <img src="./assets/pg_instruct_imgs/image3.png" alt="HunyuanImage-3.0-Instruct Showcase 5" width="85%">
535
+ </div>
536
+
537
+
538
+ **Showcase 6: Multi-Image Fusion and Creative Composition**
539
+
540
+ <div align="center">
541
+ <img src="./assets/pg_instruct_imgs/image4.png" alt="HunyuanImage-3.0-Instruct Showcase 6" width="85%">
542
+ </div>
543
+
544
+
545
+ ## 📚 Citation
546
+
547
+ If you find HunyuanImage-3.0 useful in your research, please cite our work:
548
+
549
+ ```bibtex
550
+ @article{cao2025hunyuanimage,
551
+ title={HunyuanImage 3.0 Technical Report},
552
+ author={Cao, Siyu and Chen, Hangting and Chen, Peng and Cheng, Yiji and Cui, Yutao and Deng, Xinchi and Dong, Ying and Gong, Kipper and Gu, Tianpeng and Gu, Xiusen and others},
553
+ journal={arXiv preprint arXiv:2509.23951},
554
+ year={2025}
555
+ }
556
+ ```
557
+
558
+ ## 🙏 Acknowledgements
559
+
560
+ We extend our heartfelt gratitude to the following open-source projects and communities for their invaluable contributions:
561
+
562
+ * 🤗 [Transformers](https://github.com/huggingface/transformers) - State-of-the-art NLP library
563
+ * 🎨 [Diffusers](https://github.com/huggingface/diffusers) - Diffusion models library
564
+ * 🌐 [HuggingFace](https://huggingface.co/) - AI model hub and community
565
+ * [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention
566
+ * 🚀 [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine
567
+
568
+ ## 🌟🚀 Github Star History
569
+
570
+ [![GitHub stars](https://img.shields.io/github/stars/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
571
+ [![GitHub forks](https://img.shields.io/github/forks/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
572
+
573
+
574
+ [![Star History Chart](https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanImage-3.0&type=Date)](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date)