Update README.md
Browse files
README.md
CHANGED
|
@@ -5,9 +5,11 @@ datasets:
|
|
| 5 |
tags:
|
| 6 |
- code
|
| 7 |
---
|
|
|
|
| 8 |
# Fast Training of Diffusion Models with Masked Transformers
|
|
|
|
| 9 |
|
| 10 |
-
|
| 11 |
**[Fast Training of Diffusion Models with Masked Transformers](https://openreview.net/pdf?id=vTBjBtGioE)**
|
| 12 |
<br>
|
| 13 |
Hongkai Zheng*, Weili Nie*, Arash Vahdat, Anima Anandkumar <br>
|
|
@@ -25,9 +27,7 @@ generative performance than the state-of-the-art Diffusion Transformer (DiT) mod
|
|
| 25 |
original training time. Thus, our method shows a promising way of efficiently training large transformer-based diffusion
|
| 26 |
models without sacrificing the generative performance.*
|
| 27 |
|
| 28 |
-
|
| 29 |
-
<img src="assets/figs/repo_head.png" alt="Architecture" width="900" height="500" style="display: block;"/>
|
| 30 |
-
</div>
|
| 31 |
|
| 32 |
## Requirements
|
| 33 |
|
|
@@ -75,16 +75,10 @@ python3 generate.py --config configs/test/maskdit-512.yaml --ckpt_path [path to
|
|
| 75 |
<p align='center'> Generated samples from MaskDiT 512x512 with CFG (scale=1.5).
|
| 76 |
<p\>
|
| 77 |
|
| 78 |
-
##
|
| 79 |
|
| 80 |
We use the pre-trained VAE to first encode the ImageNet dataset into latent space. You can download the ImageNet-256x256
|
| 81 |
-
and ImageNet-512x512 that have been encoded into latent space
|
| 82 |
-
|
| 83 |
-
```bash
|
| 84 |
-
bash scripts/download_assets.sh
|
| 85 |
-
```
|
| 86 |
-
|
| 87 |
-
`extract_latent.py` was used to encode the ImageNet.
|
| 88 |
|
| 89 |
### LMDB to Webdataset
|
| 90 |
|
|
@@ -151,7 +145,6 @@ from [EDM repo](https://github.com/NVlabs/edm), to evaluate the generated sample
|
|
| 151 |
|
| 152 |
|
| 153 |
### Citation
|
| 154 |
-
|
| 155 |
```
|
| 156 |
@inproceedings{Zheng2024MaskDiT,
|
| 157 |
title={Fast Training of Diffusion Models with Masked Transformers},
|
|
|
|
| 5 |
tags:
|
| 6 |
- code
|
| 7 |
---
|
| 8 |
+
|
| 9 |
# Fast Training of Diffusion Models with Masked Transformers
|
| 10 |
+
This repository hosts large model checkpoints and the Pytorch implementation for [MaskDiT](https://github.com/Anima-Lab/MaskDiT.git).
|
| 11 |
|
| 12 |
+
Paper:<br>
|
| 13 |
**[Fast Training of Diffusion Models with Masked Transformers](https://openreview.net/pdf?id=vTBjBtGioE)**
|
| 14 |
<br>
|
| 15 |
Hongkai Zheng*, Weili Nie*, Arash Vahdat, Anima Anandkumar <br>
|
|
|
|
| 27 |
original training time. Thus, our method shows a promising way of efficiently training large transformer-based diffusion
|
| 28 |
models without sacrificing the generative performance.*
|
| 29 |
|
| 30 |
+
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Requirements
|
| 33 |
|
|
|
|
| 75 |
<p align='center'> Generated samples from MaskDiT 512x512 with CFG (scale=1.5).
|
| 76 |
<p\>
|
| 77 |
|
| 78 |
+
## Dataset
|
| 79 |
|
| 80 |
We use the pre-trained VAE to first encode the ImageNet dataset into latent space. You can download the ImageNet-256x256
|
| 81 |
+
and ImageNet-512x512 that have been encoded into latent space from [hzzheng/MaskDiT-imagenet](https://huggingface.co/datasets/hzzheng/MaskDiT-imagenet).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
### LMDB to Webdataset
|
| 84 |
|
|
|
|
| 145 |
|
| 146 |
|
| 147 |
### Citation
|
|
|
|
| 148 |
```
|
| 149 |
@inproceedings{Zheng2024MaskDiT,
|
| 150 |
title={Fast Training of Diffusion Models with Masked Transformers},
|