Update README.md
Browse files
README.md
CHANGED
|
@@ -5,10 +5,9 @@ license: other
|
|
| 5 |
license_link: LICENSE
|
| 6 |
---
|
| 7 |
|
| 8 |
-
|
| 9 |
<div align="center">
|
| 10 |
|
| 11 |
-
<img src="./assets/logo.png" alt="HunyuanImage-3.0 Logo" width="
|
| 12 |
|
| 13 |
# ๐จ HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
|
| 14 |
|
|
@@ -26,12 +25,12 @@ license_link: LICENSE
|
|
| 26 |
<a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
|
| 27 |
<a href=https://huggingface.co/tencent/HunyuanImage-3.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
|
| 28 |
<a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
|
| 29 |
-
<a href
|
| 30 |
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
|
| 31 |
</div>
|
| 32 |
|
| 33 |
<p align="center">
|
| 34 |
-
๐ Join our <a href="
|
| 35 |
๐ป <a href="https://hunyuan.tencent.com/modelSquare/home/play?modelId=289&from=/visual">Official website(ๅฎ็ฝ) Try our model!</a>  
|
| 36 |
</p>
|
| 37 |
|
|
@@ -125,7 +124,10 @@ If you develop/use HunyuanImage-3.0 in your projects, welcome to let us know.
|
|
| 125 |
# 1. First install PyTorch (CUDA 12.8 Version)
|
| 126 |
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
|
| 127 |
|
| 128 |
-
# 2. Then install
|
|
|
|
|
|
|
|
|
|
| 129 |
pip install -r requirements.txt
|
| 130 |
```
|
| 131 |
|
|
@@ -204,24 +206,31 @@ hf download tencent/HunyuanImage-3.0 --local-dir ./HunyuanImage-3
|
|
| 204 |
```
|
| 205 |
|
| 206 |
#### 3๏ธโฃ Run the Demo
|
|
|
|
| 207 |
|
| 208 |
```bash
|
| 209 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 210 |
```
|
| 211 |
|
| 212 |
#### 4๏ธโฃ Command Line Arguments
|
| 213 |
|
| 214 |
-
| Arguments
|
| 215 |
-
|
| 216 |
-
| `--prompt`
|
| 217 |
-
| `--model-id`
|
| 218 |
-
| `--attn-impl`
|
| 219 |
-
| `--moe-impl`
|
| 220 |
-
| `--seed`
|
| 221 |
-
| `--diff-infer-steps`
|
| 222 |
-
| `--image-size`
|
| 223 |
-
| `--save`
|
| 224 |
-
| `--verbose`
|
|
|
|
|
|
|
| 225 |
|
| 226 |
### ๐จ Interactive Gradio Demo
|
| 227 |
|
|
@@ -422,4 +431,3 @@ We extend our heartfelt gratitude to the following open-source projects and comm
|
|
| 422 |
* ๐ [HuggingFace](https://huggingface.co/) - AI model hub and community
|
| 423 |
* โก [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention
|
| 424 |
* ๐ [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine
|
| 425 |
-
|
|
|
|
| 5 |
license_link: LICENSE
|
| 6 |
---
|
| 7 |
|
|
|
|
| 8 |
<div align="center">
|
| 9 |
|
| 10 |
+
<img src="./assets/logo.png" alt="HunyuanImage-3.0 Logo" width="600">
|
| 11 |
|
| 12 |
# ๐จ HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
|
| 13 |
|
|
|
|
| 25 |
<a href=https://hunyuan.tencent.com/image target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
|
| 26 |
<a href=https://huggingface.co/tencent/HunyuanImage-3.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
|
| 27 |
<a href=https://github.com/Tencent-Hunyuan/HunyuanImage-3.0 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
|
| 28 |
+
<a href=./assets/HunyuanImage_3_0.pdf target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
|
| 29 |
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
|
| 30 |
</div>
|
| 31 |
|
| 32 |
<p align="center">
|
| 33 |
+
๐ Join our <a href="./assets/WECHAT.md" target="_blank">WeChat</a> and <a href="https://discord.gg/ehjWMqF5wY">Discord</a> |
|
| 34 |
๐ป <a href="https://hunyuan.tencent.com/modelSquare/home/play?modelId=289&from=/visual">Official website(ๅฎ็ฝ) Try our model!</a>  
|
| 35 |
</p>
|
| 36 |
|
|
|
|
| 124 |
# 1. First install PyTorch (CUDA 12.8 Version)
|
| 125 |
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
|
| 126 |
|
| 127 |
+
# 2. Then install tencentcloud-sdk
|
| 128 |
+
pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-sdk-python
|
| 129 |
+
|
| 130 |
+
# 3. Then install other dependencies
|
| 131 |
pip install -r requirements.txt
|
| 132 |
```
|
| 133 |
|
|
|
|
| 206 |
```
|
| 207 |
|
| 208 |
#### 3๏ธโฃ Run the Demo
|
| 209 |
+
The Pretrain Checkpoint does not automatically rewrite or enhance input prompts, for optimal results currently, we recommend community partners to use deepseek to rewrite the prompts.
|
| 210 |
|
| 211 |
```bash
|
| 212 |
+
# set env
|
| 213 |
+
export DEEPSEEK_KEY_ID="your_deepseek_key_id"
|
| 214 |
+
export DEEPSEEK_KEY_SECRET="your_deepseek_key_secret"
|
| 215 |
+
|
| 216 |
+
python3 run_image_gen.py --model-id ./HunyuanImage-3 --verbose 1 --sys-deepseek-prompt "universal" --prompt "A brown and white dog is running on the grass"
|
| 217 |
```
|
| 218 |
|
| 219 |
#### 4๏ธโฃ Command Line Arguments
|
| 220 |
|
| 221 |
+
| Arguments | Description | Default |
|
| 222 |
+
|-------------------------|-----------------------------------------------------------------|-------------|
|
| 223 |
+
| `--prompt` | Input prompt | (Required) |
|
| 224 |
+
| `--model-id` | Model path | (Required) |
|
| 225 |
+
| `--attn-impl` | Attention implementation. Either `sdpa` or `flash_attention_2`. | `sdpa` |
|
| 226 |
+
| `--moe-impl` | MoE implementation. Either `eager` or `flashinfer` | `eager` |
|
| 227 |
+
| `--seed` | Random seed for image generation | `None` |
|
| 228 |
+
| `--diff-infer-steps` | Diffusion infer steps | `50` |
|
| 229 |
+
| `--image-size` | Image resolution. Can be `auto`, like `1280x768` or `16:9` | `auto` |
|
| 230 |
+
| `--save` | Image save path. | `image.png` |
|
| 231 |
+
| `--verbose` | Verbose level. 0: No log; 1: log inference information. | `0` |
|
| 232 |
+
| `--rewrite` | Whether to enable rewriting | `True` |
|
| 233 |
+
| `--sys-deepseek-prompt` | Select sys-prompt from `universal` or `text_rendering` | `universal` |
|
| 234 |
|
| 235 |
### ๐จ Interactive Gradio Demo
|
| 236 |
|
|
|
|
| 431 |
* ๐ [HuggingFace](https://huggingface.co/) - AI model hub and community
|
| 432 |
* โก [FlashAttention](https://github.com/Dao-AILab/flash-attention) - Memory-efficient attention
|
| 433 |
* ๐ [FlashInfer](https://github.com/flashinfer-ai/flashinfer) - Optimized inference engine
|
|
|