hzzheng
/

MaskDiT

code

Model card Files Files and versions

xet

Community

hzzheng commited on Dec 7, 2024

Commit

8fccd3f

verified ·

1 Parent(s): 340bb57

Update README.md

Browse files

Files changed (1) hide show

README.md +6 -13

README.md CHANGED Viewed

@@ -5,9 +5,11 @@ datasets:
 tags:
 - code
 ---
 # Fast Training of Diffusion Models with Masked Transformers
-Official PyTorch implementation of the TMLR 2024 paper:<br>
 **[Fast Training of Diffusion Models with Masked Transformers](https://openreview.net/pdf?id=vTBjBtGioE)**
 <br>
 Hongkai Zheng*, Weili Nie*, Arash Vahdat, Anima Anandkumar <br>
@@ -25,9 +27,7 @@ generative performance than the state-of-the-art Diffusion Transformer (DiT) mod
 original training time. Thus, our method shows a promising way of efficiently training large transformer-based diffusion
 models without sacrificing the generative performance.*
-<div align='center'>
-<img src="assets/figs/repo_head.png" alt="Architecture" width="900" height="500" style="display: block;"/>
-</div>
 ## Requirements
@@ -75,16 +75,10 @@ python3 generate.py --config configs/test/maskdit-512.yaml --ckpt_path [path to
 <p align='center'> Generated samples from MaskDiT 512x512 with CFG (scale=1.5).
 <p\>
-## Prepare dataset
 We use the pre-trained VAE to first encode the ImageNet dataset into latent space. You can download the ImageNet-256x256
-and ImageNet-512x512 that have been encoded into latent space by running
-```bash
-bash scripts/download_assets.sh
-```
-`extract_latent.py` was used to encode the ImageNet.
 ### LMDB to Webdataset
@@ -151,7 +145,6 @@ from [EDM repo](https://github.com/NVlabs/edm), to evaluate the generated sample
 ### Citation
 ```
 @inproceedings{Zheng2024MaskDiT,
   title={Fast Training of Diffusion Models with Masked Transformers},

 tags:
 - code
 ---
 # Fast Training of Diffusion Models with Masked Transformers
+This repository hosts large model checkpoints and the Pytorch implementation for [MaskDiT](https://github.com/Anima-Lab/MaskDiT.git).
+Paper:<br>
 **[Fast Training of Diffusion Models with Masked Transformers](https://openreview.net/pdf?id=vTBjBtGioE)**
 <br>
 Hongkai Zheng*, Weili Nie*, Arash Vahdat, Anima Anandkumar <br>
 original training time. Thus, our method shows a promising way of efficiently training large transformer-based diffusion
 models without sacrificing the generative performance.*
 ## Requirements
 <p align='center'> Generated samples from MaskDiT 512x512 with CFG (scale=1.5).
 <p\>
+## Dataset
 We use the pre-trained VAE to first encode the ImageNet dataset into latent space. You can download the ImageNet-256x256
+and ImageNet-512x512 that have been encoded into latent space from [hzzheng/MaskDiT-imagenet](https://huggingface.co/datasets/hzzheng/MaskDiT-imagenet).
 ### LMDB to Webdataset
 ### Citation
 ```
 @inproceedings{Zheng2024MaskDiT,
   title={Fast Training of Diffusion Models with Masked Transformers},