| | --- |
| | license: apple-ascl |
| | tags: |
| | - mdm |
| | --- |
| | |
| | # Matryoshka Diffusion Models |
| |
|
| | Matryoshka Diffusion Models was introduced in [the paper of the same name](https://huggingface.co/papers/2310.15111), by Jiatao Gu,Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly. |
| |
|
| | This repository contains the **Flickr 256** checkpoint. |
| |
|
| |  |
| |
|
| | ### Highlights |
| |
|
| | * This checkpoint was trained on a dataset of 50M text-image pairs collected from Flickr. |
| | * This model was trained using nested UNets at various resolutions, and generates images with a resolution of 256 × 256. |
| | * Despite training on relatively small datasets, MDMs show strong zero-shot capabilities of generating high-resolution images and videos. |
| |
|
| | ## Checkpoints |
| |
|
| | | Model | Dataset | Resolution | Nested UNets | |
| | |---------------------------------------------------------|------------|-------------|--------------| |
| | | [mdm-flickr-64](https://hf.co/pcuenq/mdm-flickr-64) | Flickr 50M | 64 × 64 | ❎ | |
| | | [mdm-flickr-256](https://hf.co/pcuenq/mdm-flickr-256) | Flickr 50M | 256 × 256 | ✅ | |
| | | [mdm-flickr-1024](https://hf.co/pcuenq/mdm-flickr-1024) | Flickr 50M | 1024 × 1024 | ✅ | |
| |
|
| | ## How to Use |
| |
|
| | Please, refer to the [original repository](https://github.com/apple/ml-mdm) for training and inference instructions. |
| |
|
| | ## Citation |
| |
|
| | ``` |
| | @misc{gu2023matryoshkadiffusionmodels, |
| | title={Matryoshka Diffusion Models}, |
| | author={Jiatao Gu and Shuangfei Zhai and Yizhe Zhang and Josh Susskind and Navdeep Jaitly}, |
| | year={2023}, |
| | eprint={2310.15111}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV}, |
| | url={https://arxiv.org/abs/2310.15111}, |
| | } |
| | ``` |