zhengzhou commited on
Commit
44cb66b
Β·
verified Β·
1 Parent(s): 8c268c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -108
README.md CHANGED
@@ -1,109 +1,109 @@
1
- ---
2
- license: afl-3.0
3
- ---
4
- # Hi-MAR
5
-
6
- <p align="center">
7
- <img src="assets/show_imgs.png" width="400"/>
8
- <p>
9
-
10
- <p align="center">
11
- πŸ–₯️ <a href="https://github.com/HiDream-ai/himar">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://Tom-zgt.github.io/Hi-MAR-page/"><b>Project Page</b></a> &nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/HiDream-ai/Hi-MAR/tree/main">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp πŸ“‘ <a href="">Paper </a> &nbsp&nbsp | &nbsp&nbsp πŸ“– <a href="">PDF</a> &nbsp&nbsp
12
- <br>
13
-
14
- [**Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots**](https://Tom-zgt.github.io/Hi-MAR-page/) (ICML 2025)<be>
15
-
16
- This is the official repository for the Paper "Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots"
17
-
18
- ## Overview
19
-
20
- We present a Hierarchical Masked Autoregressive models (Hi-MAR) that pivot on low-resolution image tokens to trigger hierarchical autoregressive modeling in a multi-phase manner.
21
-
22
- #### πŸ” What We're Working to Solve?
23
-
24
- - **Incapable of utilizing global context** in early-stage predictions of the next-token paradigm
25
- - **Training-inference discrepancy** across multi-scale predictions
26
- - **Suboptimal multi-scale probability distribution modeling**
27
- - **Lack of global information in the denoising process of the MLP-based Diffusion head**
28
-
29
-
30
- ## πŸ”₯ Updates
31
-
32
- - [x] **\[2025.05.22\]** Upload inference code and pretrained class-conditional Hi-MAR models trained on ImageNet 256x256.
33
-
34
- ## πŸƒπŸΌ Inference
35
-
36
- <details open>
37
- <summary><strong>Environment Requirement</strong></summary>
38
-
39
-
40
- Clone the repo:
41
-
42
- ```
43
- git clone https://github.com/HiDream-ai/himar.git
44
- cd himar
45
- ```
46
-
47
- Install dependencies:
48
-
49
- ```
50
- conda env create -f environment.yaml
51
-
52
- conda activate himar
53
- ```
54
-
55
- </details>
56
-
57
- <details open>
58
- <summary><strong>Model Download</strong></summary>
59
-
60
- Download VAE from the [link](https://www.dropbox.com/scl/fi/hhmuvaiacrarfg28qxhwz/kl16.ckpt?rlkey=l44xipsezc8atcffdp4q7mwmh&dl=0) in the [MAR Github](https://github.com/LTH14/mar/).
61
-
62
- You can download our pre-trained Hi-MAR models directly from the links provided here.
63
-
64
- | Models | FID-50K | Inception Score | #params |
65
- | ------------------------------------------------------------ | ------- | --------------- | ------- |
66
- | [Hi-MAR-B](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-B/checkpoint-last.pt) | 1.93 | 293.0 | 244M |
67
- | [Hi-MAR-L](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-L/checkpoint-last.pt) | 1.66 | 322.3 | 529M |
68
- | [Hi-MAR-H](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-H/checkpoint-last.pt) | 1.52 | 322.78 | 1090M |
69
-
70
- </details>
71
-
72
- <details open>
73
- <summary><strong>Evaluation</strong></summary>
74
-
75
- Evaluate Hi-MAR-B on ImageNet256x256:
76
-
77
- ```
78
- torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_base --diffloss_d 6 --diffloss_w 1024 --output_dir ./himar_base_test --resume /path/to/Hi-MAR-B --num_images 50000 --num_iter 4 --cfg 2.5 --re_cfg 2.7 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 6 --gdm_w 512 --eval_bsz 256 --load_epoch -1 --head 8 --ratio 4 --cos --evaluate
79
- ```
80
-
81
- Evaluate Hi-MAR-L on ImageNet256x256:
82
-
83
- ```
84
- torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_base --diffloss_d 8 --diffloss_w 1280 --output_dir ./himar_large_test --resume /path/to/Hi-MAR-L --num_images 50000 --num_iter 4 --cfg 3.5 --re_cfg 3.5 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 8 --gdm_w 512 --eval_bsz 256 --load_epoch -1 --head 8 --ratio 4 --cos --evaluate
85
- ```
86
-
87
- Evaluate Hi-MAR-H on ImageNet256x256:
88
-
89
- ```
90
- torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_base --diffloss_d 12 --diffloss_w 1536 --output_dir ./himar_huge_test --resume /path/to/Hi-MAR-H --num_images 50000 --num_iter 12 --cfg 3.2 --re_cfg 5.5 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 12 --gdm_w 768 --eval_bsz 256 --load_epoch -1 --head 12 --ratio 4 --cos --evaluate
91
- ```
92
-
93
- </details>
94
-
95
-
96
- ## 🌟 Star and Citation
97
-
98
- If you find our work helpful for your research, please consider giving a star⭐ on this repository and citing our work.
99
-
100
- ```
101
-
102
- ```
103
-
104
-
105
- ## πŸ’– Acknowledgement
106
-
107
- <span id="acknowledgement"></span>
108
-
109
  Thanks to the contribution of [MAR](https://github.com/LTH14/mar)
 
1
+ ---
2
+ license: afl-3.0
3
+ ---
4
+ # Hi-MAR
5
+
6
+ <p align="center">
7
+ <img src="assets/show_imgs.png" width="800"/>
8
+ <p>
9
+
10
+ <p align="center">
11
+ πŸ–₯️ <a href="https://github.com/HiDream-ai/himar">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://Tom-zgt.github.io/Hi-MAR-page/"><b>Project Page</b></a> &nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/HiDream-ai/Hi-MAR/tree/main">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp πŸ“‘ <a href="">Paper </a> &nbsp&nbsp | &nbsp&nbsp πŸ“– <a href="">PDF</a> &nbsp&nbsp
12
+ <br>
13
+
14
+ [**Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots**](https://Tom-zgt.github.io/Hi-MAR-page/) (ICML 2025)<be>
15
+
16
+ This is the official repository for the Paper "Hierarchical Masked Autoregressive Models with Low-Resolution Token Pivots"
17
+
18
+ ## Overview
19
+
20
+ We present a Hierarchical Masked Autoregressive models (Hi-MAR) that pivot on low-resolution image tokens to trigger hierarchical autoregressive modeling in a multi-phase manner.
21
+
22
+ #### πŸ” What We're Working to Solve?
23
+
24
+ - **Incapable of utilizing global context** in early-stage predictions of the next-token paradigm
25
+ - **Training-inference discrepancy** across multi-scale predictions
26
+ - **Suboptimal multi-scale probability distribution modeling**
27
+ - **Lack of global information in the denoising process of the MLP-based Diffusion head**
28
+
29
+
30
+ ## πŸ”₯ Updates
31
+
32
+ - [x] **\[2025.05.22\]** Upload inference code and pretrained class-conditional Hi-MAR models trained on ImageNet 256x256.
33
+
34
+ ## πŸƒπŸΌ Inference
35
+
36
+ <details open>
37
+ <summary><strong>Environment Requirement</strong></summary>
38
+
39
+
40
+ Clone the repo:
41
+
42
+ ```
43
+ git clone https://github.com/HiDream-ai/himar.git
44
+ cd himar
45
+ ```
46
+
47
+ Install dependencies:
48
+
49
+ ```
50
+ conda env create -f environment.yaml
51
+
52
+ conda activate himar
53
+ ```
54
+
55
+ </details>
56
+
57
+ <details open>
58
+ <summary><strong>Model Download</strong></summary>
59
+
60
+ Download VAE from the [link](https://www.dropbox.com/scl/fi/hhmuvaiacrarfg28qxhwz/kl16.ckpt?rlkey=l44xipsezc8atcffdp4q7mwmh&dl=0) in the [MAR Github](https://github.com/LTH14/mar/).
61
+
62
+ You can download our pre-trained Hi-MAR models directly from the links provided here.
63
+
64
+ | Models | FID-50K | Inception Score | #params |
65
+ | ------------------------------------------------------------ | ------- | --------------- | ------- |
66
+ | [Hi-MAR-B](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-B/checkpoint-last.pth) | 1.93 | 293.0 | 244M |
67
+ | [Hi-MAR-L](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-L/checkpoint-last.pth) | 1.66 | 322.3 | 529M |
68
+ | [Hi-MAR-H](https://huggingface.co/HiDream-ai/Hi-MAR/blob/main/Hi-MAR-H/checkpoint-last.pth) | 1.52 | 322.78 | 1090M |
69
+
70
+ </details>
71
+
72
+ <details open>
73
+ <summary><strong>Evaluation</strong></summary>
74
+
75
+ Evaluate Hi-MAR-B on ImageNet256x256:
76
+
77
+ ```
78
+ torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_base --diffloss_d 6 --diffloss_w 1024 --output_dir ./himar_base_test --resume /path/to/Hi-MAR-B --num_images 50000 --num_iter 4 --cfg 2.5 --re_cfg 2.7 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 6 --gdm_w 512 --eval_bsz 256 --load_epoch -1 --head 8 --ratio 4 --cos --evaluate
79
+ ```
80
+
81
+ Evaluate Hi-MAR-L on ImageNet256x256:
82
+
83
+ ```
84
+ torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_large --diffloss_d 8 --diffloss_w 1280 --output_dir ./himar_large_test --resume /path/to/Hi-MAR-L --num_images 50000 --num_iter 4 --cfg 3.5 --re_cfg 3.5 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 8 --gdm_w 512 --eval_bsz 256 --load_epoch -1 --head 8 --ratio 4 --cos --evaluate
85
+ ```
86
+
87
+ Evaluate Hi-MAR-H on ImageNet256x256:
88
+
89
+ ```
90
+ torchrun --nproc_per_node=8 --nnodes=1 main_himar.py --img_size 256 --vae_path /path/to/vae --vae_embed_dim 16 --vae_stride 16 --patch_size 1 --model himar_huge --diffloss_d 12 --diffloss_w 1536 --output_dir ./himar_huge_test --resume /path/to/Hi-MAR-H --num_images 50000 --num_iter 12 --cfg 3.2 --re_cfg 5.5 --cfg_schedule linear --cond_scale 8 --cond_dim 16 --two_diffloss --global_dm --gdm_d 12 --gdm_w 768 --eval_bsz 256 --load_epoch -1 --head 12 --ratio 4 --cos --evaluate
91
+ ```
92
+
93
+ </details>
94
+
95
+
96
+ ## 🌟 Star and Citation
97
+
98
+ If you find our work helpful for your research, please consider giving a star⭐ on this repository and citing our work.
99
+
100
+ ```
101
+
102
+ ```
103
+
104
+
105
+ ## πŸ’– Acknowledgement
106
+
107
+ <span id="acknowledgement"></span>
108
+
109
  Thanks to the contribution of [MAR](https://github.com/LTH14/mar)