amd
/

Nitro-1-SD

Text-to-Image

Diffusers

Model card Files Files and versions

xet

Community

ascust commited on Nov 6, 2024

Commit

f97787b

verified ·

1 Parent(s): 6510327

Update README.md

Browse files

Files changed (1) hide show

README.md +81 -3

README.md CHANGED Viewed

@@ -1,3 +1,81 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- poloclub/diffusiondb
+base_model:
+- stabilityai/stable-diffusion-2-1-base
+pipeline_tag: text-to-image
+library_name: diffusers
+---
+# AMD Nitro Diffusion
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6355aded9c72a7e742f341a4/AsUvS7acUDLZhKOMRSH37.jpeg)
+## Introduction
+AMD Nitro Diffusion is a series of efficient text-to-image generation models that are distilled from popular diffusion models on AMD Instinct™ GPUs. The release consists of:
+* Stable Diffusion 2.1 Nitro: a UNet-based one-step model distilled from [Stable Diffusion 2.1](https://huggingface.co/stabilityai/stable-diffusion-2-1-base).
+* PixArt-Sigma Nitro: a transformer-based high resolution one-step model distilled from [PixArt-Sigma](https://pixart-alpha.github.io/PixArt-sigma-project/).
+⚡️ [Open-source code](https://github.com/AMD-AIG-AIMA/AMD-Diffusion-Distillation)! The models are based on our re-implementation of [Latent Adversarial Diffusion Distillation](https://arxiv.org/abs/2403.12015), the method used to build the popular Stable Diffusion 3 Turbo model. Since the original authors didn't provide training code, we release our re-implementation to help advance further research in the field.
+## Details
+* **Model architecture**: Stable Diffusion 2.1 Nitro has the same architecture as Stable Diffusion 2.1 and is compatible with the diffusers pipeline.
+* **Inference steps**: This model is distilled to perform inference in just a single step. However, the training code also supports distilling a model for 2, 4 or 8 steps.
+* **Hardware**: We use a single node consisting of 4 AMD Instinct™ MI250 GPUs for distilling Stable Diffusion 2.1 Nitro.
+* **Dataset**: We use 1M prompts from [DiffusionDB](https://huggingface.co/datasets/poloclub/diffusiondb) and generate the corresponding images from the base Stable Diffusion 2.1 Nitro model.
+* **Training cost**: The distillation process achieves reasonable results in less than 2 days on a single node.
+## Quickstart
+```python
+from diffusers import DDPMScheduler, DiffusionPipeline
+import torch
+scheduler = DDPMScheduler.from_pretrained("stabilityai/stable-diffusion-2-1-base", subfolder="scheduler")
+pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base", scheduler=scheduler)
+ckpt_path = '<path to distilled checkpoint>'
+unet_state_dict = torch.load(ckpt_path)
+pipe.unet.load_state_dict(unet_state_dict)
+pipe = pipe.to("cuda")
+image = pipe(prompt='a photo of a cat',
+             num_inference_steps=1,
+             guidance_scale=0,
+             timesteps=[999]).images[0]
+```
+For more details on training and evaluation please visit the [GitHub repo](https://github.com/AMD-AIG-AIMA/AMD-Diffusion-Distillation).
+## Results
+Compared to the [Stable Diffusion 2.1 base model](https://huggingface.co/stabilityai/stable-diffusion-2-1-base), we achieve 95.9% reduction in FLOPs at the cost of just 2.5% lower CLIP score and 2.2% higher FID.
+| Model    | FID &darr; | CLIP &uarr; |FLOPs| Latency on AMD Instinct MI250 (sec)
+| :---: | :---: | :---: | :---: | :---:
+| Stable Diffusion 2.1 base, 50 steps (cfg=7.5) | 25.47   | 0.3286 |83.04 | 4.94
+| **Stable Diffusion 2.1 Nitro**, 1 step | 26.04     | 0.3204|3.36 | 0.18
+## License
+Copyright (c) 2018-2024 Advanced Micro Devices, Inc. All Rights Reserved.
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.