hzzheng commited on
Commit
8fccd3f
·
verified ·
1 Parent(s): 340bb57

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -13
README.md CHANGED
@@ -5,9 +5,11 @@ datasets:
5
  tags:
6
  - code
7
  ---
 
8
  # Fast Training of Diffusion Models with Masked Transformers
 
9
 
10
- Official PyTorch implementation of the TMLR 2024 paper:<br>
11
  **[Fast Training of Diffusion Models with Masked Transformers](https://openreview.net/pdf?id=vTBjBtGioE)**
12
  <br>
13
  Hongkai Zheng*, Weili Nie*, Arash Vahdat, Anima Anandkumar <br>
@@ -25,9 +27,7 @@ generative performance than the state-of-the-art Diffusion Transformer (DiT) mod
25
  original training time. Thus, our method shows a promising way of efficiently training large transformer-based diffusion
26
  models without sacrificing the generative performance.*
27
 
28
- <div align='center'>
29
- <img src="assets/figs/repo_head.png" alt="Architecture" width="900" height="500" style="display: block;"/>
30
- </div>
31
 
32
  ## Requirements
33
 
@@ -75,16 +75,10 @@ python3 generate.py --config configs/test/maskdit-512.yaml --ckpt_path [path to
75
  <p align='center'> Generated samples from MaskDiT 512x512 with CFG (scale=1.5).
76
  <p\>
77
 
78
- ## Prepare dataset
79
 
80
  We use the pre-trained VAE to first encode the ImageNet dataset into latent space. You can download the ImageNet-256x256
81
- and ImageNet-512x512 that have been encoded into latent space by running
82
-
83
- ```bash
84
- bash scripts/download_assets.sh
85
- ```
86
-
87
- `extract_latent.py` was used to encode the ImageNet.
88
 
89
  ### LMDB to Webdataset
90
 
@@ -151,7 +145,6 @@ from [EDM repo](https://github.com/NVlabs/edm), to evaluate the generated sample
151
 
152
 
153
  ### Citation
154
-
155
  ```
156
  @inproceedings{Zheng2024MaskDiT,
157
  title={Fast Training of Diffusion Models with Masked Transformers},
 
5
  tags:
6
  - code
7
  ---
8
+
9
  # Fast Training of Diffusion Models with Masked Transformers
10
+ This repository hosts large model checkpoints and the Pytorch implementation for [MaskDiT](https://github.com/Anima-Lab/MaskDiT.git).
11
 
12
+ Paper:<br>
13
  **[Fast Training of Diffusion Models with Masked Transformers](https://openreview.net/pdf?id=vTBjBtGioE)**
14
  <br>
15
  Hongkai Zheng*, Weili Nie*, Arash Vahdat, Anima Anandkumar <br>
 
27
  original training time. Thus, our method shows a promising way of efficiently training large transformer-based diffusion
28
  models without sacrificing the generative performance.*
29
 
30
+
 
 
31
 
32
  ## Requirements
33
 
 
75
  <p align='center'> Generated samples from MaskDiT 512x512 with CFG (scale=1.5).
76
  <p\>
77
 
78
+ ## Dataset
79
 
80
  We use the pre-trained VAE to first encode the ImageNet dataset into latent space. You can download the ImageNet-256x256
81
+ and ImageNet-512x512 that have been encoded into latent space from [hzzheng/MaskDiT-imagenet](https://huggingface.co/datasets/hzzheng/MaskDiT-imagenet).
 
 
 
 
 
 
82
 
83
  ### LMDB to Webdataset
84
 
 
145
 
146
 
147
  ### Citation
 
148
  ```
149
  @inproceedings{Zheng2024MaskDiT,
150
  title={Fast Training of Diffusion Models with Masked Transformers},