| | --- |
| | library_name: diffusers |
| | license: mit |
| | pipeline_tag: unconditional-image-generation |
| | --- |
| | |
| | # Autoregressive Image Generation without Vector Quantization |
| |
|
| | ## About |
| | This model (MAR) introduces a novel approach to autoregressive image generation by eliminating the need for vector quantization. |
| | Instead of relying on discrete tokens, the model operates in a continuous-valued space using a diffusion process to model the per-token probability distribution. |
| | By employing a Diffusion Loss function, the model achieves efficient and high-quality image generation while benefiting from the speed advantages of autoregressive sequence modeling. |
| | This approach simplifies the generation process, making it applicable to broader continuous-valued domains beyond just image synthesis. |
| | It is based on [this paper](https://arxiv.org/abs/2406.11838) |
| |
|
| | ## Usage: |
| | You can easily load it through the Hugging Face `DiffusionPipeline` and optionally customize various parameters such as the model type, number of steps, and class labels. |
| |
|
| | ```python |
| | from diffusers import DiffusionPipeline |
| | |
| | # load the pretrained model |
| | pipeline = DiffusionPipeline.from_pretrained("jadechoghari/mar", trust_remote_code=True, custom_pipeline="jadechoghari/mar") |
| | |
| | # generate an image with the model |
| | generated_image = pipeline( |
| | model_type="mar_huge", # choose from 'mar_base', 'mar_large', or 'mar_huge' |
| | seed=42, # set a seed for reproducibility |
| | num_ar_steps=64, # number of autoregressive steps |
| | class_labels=[207, 360, 388], # provide valid ImageNet class labels |
| | cfg_scale=4, # classifier-free guidance scale |
| | output_dir="./images", # directory to save generated images |
| | cfg_schedule = "constant", # choose between 'constant' (suggested) and 'linear' |
| | ) |
| | |
| | # display the generated image |
| | generated_image.show() |
| | ``` |
| |
|
| | <p align="center"> |
| | <img src="https://github.com/LTH14/mar/raw/main/demo/visual.png" width="500"> |
| | </p> |
| |
|
| | This code loads the model, configures it for image generation, and saves the output to a specified directory. |
| |
|
| | We offer three pre-trained MAR models in `safetensors` format: |
| | - `mar-base.safetensors` |
| | - `mar-large.safetensors` |
| | - `mar-huge.safetensors` |
| |
|
| |
|
| | <!-- <p align="center"> |
| | <img src="https://github.com/LTH14/mar/raw/main/demo/visual.png" width="720"> |
| | </p> --> |
| |
|
| | This is a Hugging Face Diffusers/GPU implementation of the paper [Autoregressive Image Generation without Vector Quantization](https://arxiv.org/abs/2406.11838) |
| |
|
| | The Official PyTorch Implementation is released in [this repository](https://github.com/LTH14/mar) |
| |
|
| | ``` |
| | @article{li2024autoregressive, |
| | title={Autoregressive Image Generation without Vector Quantization}, |
| | author={Li, Tianhong and Tian, Yonglong and Li, He and Deng, Mingyang and He, Kaiming}, |
| | journal={arXiv preprint arXiv:2406.11838}, |
| | year={2024} |
| | } |
| | ``` |
| |
|
| | ## Acknowledgements |
| | We thank Congyue Deng and Xinlei Chen for helpful discussion. We thank |
| | Google TPU Research Cloud (TRC) for granting us access to TPUs, and Google Cloud Platform for |
| | supporting GPU resources. |
| |
|
| | A large portion of codes in this repo is based on [MAE](https://github.com/facebookresearch/mae), [MAGE](https://github.com/LTH14/mage) and [DiT](https://github.com/facebookresearch/DiT). |
| |
|
| | ## Contact |
| |
|
| | If you have any questions, feel free to contact me through email (tianhong@mit.edu). Enjoy! |