Update README.md
Browse files
README.md
CHANGED
|
@@ -1,109 +1,109 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: afl-3.0
|
| 3 |
-
---
|
| 4 |
-
# Hi-MAR
|
| 5 |
-
|
| 6 |
-
<p align="center">
|
| 7 |
-
<img src="assets/show_imgs.png" width="
|
| 8 |
-
<p>
|
| 9 |
-
|
| 10 |
-
<p align="center">
|
| 11 |
-
π₯οΈ <a href="https://github.com/HiDream-ai/himar">GitHub</a>    ο½    π <a href="https://Tom-zgt.github.io/Hi-MAR-page/"><b>Project Page</b></a>    |   π€ <a href="https://huggingface.co/HiDream-ai/Hi-MAR/tree/main">Hugging Face</a>   |    π <a href="">Paper </a>    |    π <a href="">PDF</a>   
|
| 12 |
-
<br>
|
| 13 |
-
|
| 14 |
-
[**Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots**](https://Tom-zgt.github.io/Hi-MAR-page/) (ICML 2025)<be>
|
| 15 |
-
|
| 16 |
-
This is the official repository for the Paper "Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots"
|
| 17 |
-
|
| 18 |
-
## Overview
|
| 19 |
-
|
| 20 |
-
We present a Hierarchical Masked Autoregressive models (Hi-MAR) that pivot on low-resolution image tokens to trigger hierarchical autoregressive modeling in a multi-phase manner.
|
| 21 |
-
|
| 22 |
-
#### π What We're Working to Solve?
|
| 23 |
-
|
| 24 |
-
- **Incapable of utilizing global context** in early-stage predictions of the next-token paradigm
|
| 25 |
-
- **Training-inference discrepancy** across multi-scale predictions
|
| 26 |
-
- **Suboptimal multi-scale probability distribution modeling**
|
| 27 |
-
- **Lack of global information in the denoising process of the MLP-based Diffusion head**
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
## π₯ Updates
|
| 31 |
-
|
| 32 |
-
- [x] **\[2025.05.22\]** Upload inference code and pretrained class-conditional Hi-MAR models trained on ImageNet 256x256.
|
| 33 |
-
|
| 34 |
-
## ππΌ Inference
|
| 35 |
-
|
| 36 |
-
<details open>
|
| 37 |
-
<summary><strong>Environment Requirement</strong></summary>
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
Clone the repo:
|
| 41 |
-
|
| 42 |
-
```
|
| 43 |
-
git clone https://github.com/HiDream-ai/himar.git
|
| 44 |
-
cd himar
|
| 45 |
-
```
|
| 46 |
-
|
| 47 |
-
Install dependencies:
|
| 48 |
-
|
| 49 |
-
```
|
| 50 |
-
conda env create -f environment.yaml
|
| 51 |
-
|
| 52 |
-
conda activate himar
|
| 53 |
-
```
|
| 54 |
-
|
| 55 |
-
</details>
|
| 56 |
-
|
| 57 |
-
<details open>
|
| 58 |
-
<summary><strong>Model Download</strong></summary>
|
| 59 |
-
|
| 60 |
-
Download VAE from the [link](https://www.dropbox.com/scl/fi/hhmuvaiacrarfg28qxhwz/kl16.ckpt?rlkey=l44xipsezc8atcffdp4q7mwmh&dl=0) in the [MAR Github](https://github.com/LTH14/mar/).
|
| 61 |
-
|
| 62 |
-
You can download our pre-trained Hi-MAR models directly from the links provided here.
|
| 63 |
-
|
| 64 |
-
| Models | FID-50K | Inception Score | #params |
|
| 65 |
-
| ------------------------------------------------------------ | ------- | --------------- | ------- |
|
| 66 |
-
| [Hi-MAR-B](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-B/checkpoint-last.
|
| 67 |
-
| [Hi-MAR-L](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-L/checkpoint-last.
|
| 68 |
-
| [Hi-MAR-H](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-H/checkpoint-last.
|
| 69 |
-
|
| 70 |
-
</details>
|
| 71 |
-
|
| 72 |
-
<details open>
|
| 73 |
-
<summary><strong>Evaluation</strong></summary>
|
| 74 |
-
|
| 75 |
-
Evaluate Hi-MAR-B on ImageNet256x256:
|
| 76 |
-
|
| 77 |
-
```
|
| 78 |
-
torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_base --diffloss_d 6 --diffloss_w 1024 --output_dir ./himar_base_test --resume /path/to/Hi-MAR-B --num_images 50000 --num_iter 4 --cfg 2.5 --re_cfg 2.7 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 6 --gdm_w 512 --eval_bsz 256 --load_epoch -1 --head 8 --ratio 4 --cos --evaluate
|
| 79 |
-
```
|
| 80 |
-
|
| 81 |
-
Evaluate Hi-MAR-L on ImageNet256x256:
|
| 82 |
-
|
| 83 |
-
```
|
| 84 |
-
torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
Evaluate Hi-MAR-H on ImageNet256x256:
|
| 88 |
-
|
| 89 |
-
```
|
| 90 |
-
torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model
|
| 91 |
-
```
|
| 92 |
-
|
| 93 |
-
</details>
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
## π Star and Citation
|
| 97 |
-
|
| 98 |
-
If you find our work helpful for your research, please consider giving a starβ on this repository and citing our work.
|
| 99 |
-
|
| 100 |
-
```
|
| 101 |
-
|
| 102 |
-
```
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
## π Acknowledgement
|
| 106 |
-
|
| 107 |
-
<span id="acknowledgement"></span>
|
| 108 |
-
|
| 109 |
Thanks to the contribution of [MAR](https://github.com/LTH14/mar)
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: afl-3.0
|
| 3 |
+
---
|
| 4 |
+
# Hi-MAR
|
| 5 |
+
|
| 6 |
+
<p align="center">
|
| 7 |
+
<img src="assets/show_imgs.png" width="800"/>
|
| 8 |
+
<p>
|
| 9 |
+
|
| 10 |
+
<p align="center">
|
| 11 |
+
π₯οΈ <a href="https://github.com/HiDream-ai/himar">GitHub</a>    ο½    π <a href="https://Tom-zgt.github.io/Hi-MAR-page/"><b>Project Page</b></a>    |   π€ <a href="https://huggingface.co/HiDream-ai/Hi-MAR/tree/main">Hugging Face</a>   |    π <a href="">Paper </a>    |    π <a href="">PDF</a>   
|
| 12 |
+
<br>
|
| 13 |
+
|
| 14 |
+
[**Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots**](https://Tom-zgt.github.io/Hi-MAR-page/) (ICML 2025)<be>
|
| 15 |
+
|
| 16 |
+
This is the official repository for the Paper "Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots"
|
| 17 |
+
|
| 18 |
+
## Overview
|
| 19 |
+
|
| 20 |
+
We present a Hierarchical Masked Autoregressive models (Hi-MAR) that pivot on low-resolution image tokens to trigger hierarchical autoregressive modeling in a multi-phase manner.
|
| 21 |
+
|
| 22 |
+
#### π What We're Working to Solve?
|
| 23 |
+
|
| 24 |
+
- **Incapable of utilizing global context** in early-stage predictions of the next-token paradigm
|
| 25 |
+
- **Training-inference discrepancy** across multi-scale predictions
|
| 26 |
+
- **Suboptimal multi-scale probability distribution modeling**
|
| 27 |
+
- **Lack of global information in the denoising process of the MLP-based Diffusion head**
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
## π₯ Updates
|
| 31 |
+
|
| 32 |
+
- [x] **\[2025.05.22\]** Upload inference code and pretrained class-conditional Hi-MAR models trained on ImageNet 256x256.
|
| 33 |
+
|
| 34 |
+
## ππΌ Inference
|
| 35 |
+
|
| 36 |
+
<details open>
|
| 37 |
+
<summary><strong>Environment Requirement</strong></summary>
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
Clone the repo:
|
| 41 |
+
|
| 42 |
+
```
|
| 43 |
+
git clone https://github.com/HiDream-ai/himar.git
|
| 44 |
+
cd himar
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
Install dependencies:
|
| 48 |
+
|
| 49 |
+
```
|
| 50 |
+
conda env create -f environment.yaml
|
| 51 |
+
|
| 52 |
+
conda activate himar
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
</details>
|
| 56 |
+
|
| 57 |
+
<details open>
|
| 58 |
+
<summary><strong>Model Download</strong></summary>
|
| 59 |
+
|
| 60 |
+
Download VAE from the [link](https://www.dropbox.com/scl/fi/hhmuvaiacrarfg28qxhwz/kl16.ckpt?rlkey=l44xipsezc8atcffdp4q7mwmh&dl=0) in the [MAR Github](https://github.com/LTH14/mar/).
|
| 61 |
+
|
| 62 |
+
You can download our pre-trained Hi-MAR models directly from the links provided here.
|
| 63 |
+
|
| 64 |
+
| Models | FID-50K | Inception Score | #params |
|
| 65 |
+
| ------------------------------------------------------------ | ------- | --------------- | ------- |
|
| 66 |
+
| [Hi-MAR-B](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-B/checkpoint-last.pth) | 1.93 | 293.0 | 244M |
|
| 67 |
+
| [Hi-MAR-L](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-L/checkpoint-last.pth) | 1.66 | 322.3 | 529M |
|
| 68 |
+
| [Hi-MAR-H](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-H/checkpoint-last.pth) | 1.52 | 322.78 | 1090M |
|
| 69 |
+
|
| 70 |
+
</details>
|
| 71 |
+
|
| 72 |
+
<details open>
|
| 73 |
+
<summary><strong>Evaluation</strong></summary>
|
| 74 |
+
|
| 75 |
+
Evaluate Hi-MAR-B on ImageNet256x256:
|
| 76 |
+
|
| 77 |
+
```
|
| 78 |
+
torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_base --diffloss_d 6 --diffloss_w 1024 --output_dir ./himar_base_test --resume /path/to/Hi-MAR-B --num_images 50000 --num_iter 4 --cfg 2.5 --re_cfg 2.7 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 6 --gdm_w 512 --eval_bsz 256 --load_epoch -1 --head 8 --ratio 4 --cos --evaluate
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
Evaluate Hi-MAR-L on ImageNet256x256:
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_large --diffloss_d 8 --diffloss_w 1280 --output_dir ./himar_large_test --resume /path/to/Hi-MAR-L --num_images 50000 --num_iter 4 --cfg 3.5 --re_cfg 3.5 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 8 --gdm_w 512 --eval_bsz 256 --load_epoch -1 --head 8 --ratio 4 --cos --evaluate
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
Evaluate Hi-MAR-H on ImageNet256x256:
|
| 88 |
+
|
| 89 |
+
```
|
| 90 |
+
torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_huge --diffloss_d 12 --diffloss_w 1536 --output_dir ./himar_huge_test --resume /path/to/Hi-MAR-H --num_images 50000 --num_iter 12 --cfg 3.2 --re_cfg 5.5 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 12 --gdm_w 768 --eval_bsz 256 --load_epoch -1 --head 12 --ratio 4 --cos --evaluate
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
</details>
|
| 94 |
+
|
| 95 |
+
|
| 96 |
+
## π Star and Citation
|
| 97 |
+
|
| 98 |
+
If you find our work helpful for your research, please consider giving a starβ on this repository and citing our work.
|
| 99 |
+
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
## π Acknowledgement
|
| 106 |
+
|
| 107 |
+
<span id="acknowledgement"></span>
|
| 108 |
+
|
| 109 |
Thanks to the contribution of [MAR](https://github.com/LTH14/mar)
|