Commit ·
2c7456b
1
Parent(s): 3fe0a7f
Update README.md
Browse files
README.md
CHANGED
|
@@ -72,23 +72,25 @@ image = pipe(prompt).images[0]
|
|
| 72 |
image.save("example.png")
|
| 73 |
```
|
| 74 |
|
| 75 |
-
The above examples have been tested on a single NVIDIA GeForce RTX 3090 GPU with the following versions:
|
| 76 |
-
|
| 77 |
-
```
|
| 78 |
-
torch 1.13.1+cu117
|
| 79 |
-
transformers 4.29.2
|
| 80 |
-
diffusers 0.15.0
|
| 81 |
-
```
|
| 82 |
|
| 83 |
|
| 84 |
|
| 85 |
## Compression Method
|
| 86 |
|
| 87 |
### U-Net Architecture
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
-
|
| 91 |
-
- 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
|
| 94 |
### Distillation Pretraining
|
|
@@ -96,7 +98,7 @@ The compact U-Net was trained to mimic the behavior of the original U-Net. We le
|
|
| 96 |
|
| 97 |
|
| 98 |
<center>
|
| 99 |
-
<img alt="
|
| 100 |
</center>
|
| 101 |
|
| 102 |
|
|
@@ -116,23 +118,24 @@ The following table shows the zero-shot results on 30K samples from the MS-COCO
|
|
| 116 |
|
| 117 |
| Model | FID↓ | IS↑ | CLIP Score↑<br>(ViT-g/14) | # Params,<br>U-Net | # Params,<br>Whole SDM |
|
| 118 |
|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 119 |
-
| Stable Diffusion v1.4 | 13.05 | 36.76 | 0.2958 | 0.86B | 1.04B |
|
| 120 |
-
| BK-SDM-Base (Ours) | 15.76 | 33.79 | 0.2878 | 0.58B | 0.76B |
|
| 121 |
-
| BK-SDM-Small (Ours) | 16.98 | 31.68 | 0.2677 | 0.49B | 0.66B |
|
| 122 |
-
| BK-SDM-Tiny (Ours) | 17.12 | 30.09 | 0.2653 | 0.33B | 0.50B |
|
| 123 |
|
| 124 |
<br/>
|
| 125 |
|
| 126 |
The following figure depicts synthesized images with some MS-COCO captions.
|
| 127 |
|
| 128 |
<center>
|
| 129 |
-
<img alt="Visual results" img src="https://
|
| 130 |
</center>
|
| 131 |
|
| 132 |
|
| 133 |
<br/>
|
| 134 |
|
| 135 |
|
|
|
|
| 136 |
# Uses
|
| 137 |
_Note: This section is taken from the [Stable Diffusion v1 model card]( https://huggingface.co/CompVis/stable-diffusion-v1-4) (which was based on the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini)) and applies in the same way to BK-SDMs_.
|
| 138 |
|
|
|
|
| 72 |
image.save("example.png")
|
| 73 |
```
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
|
| 77 |
|
| 78 |
## Compression Method
|
| 79 |
|
| 80 |
### U-Net Architecture
|
| 81 |
+
Certain residual and attention blocks were eliminated from the U-Net of SDM-v1.4:
|
| 82 |
+
|
| 83 |
+
- 1.04B-param [SDM-v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) (0.86B-param U-Net): the original source model.
|
| 84 |
+
- 0.76B-param [**BK-SDM-Base**](https://huggingface.co/nota-ai/bk-sdm-base) (0.58B-param U-Net): obtained with ① fewer blocks in outer stages.
|
| 85 |
+
- 0.66B-param [**BK-SDM-Small**](https://huggingface.co/nota-ai/bk-sdm-small) (0.49B-param U-Net): obtained with ① and ② mid-stage removal.
|
| 86 |
+
- 0.50B-param [**BK-SDM-Tiny**](https://huggingface.co/nota-ai/bk-sdm-tiny) (0.33B-param U-Net): obtained with ①, ②, and ③ further inner-stage removal.
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
<center>
|
| 90 |
+
<img alt="U-Net architectures" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_arch.png" width="100%">
|
| 91 |
+
</center>
|
| 92 |
+
|
| 93 |
+
|
| 94 |
|
| 95 |
|
| 96 |
### Distillation Pretraining
|
|
|
|
| 98 |
|
| 99 |
|
| 100 |
<center>
|
| 101 |
+
<img alt="KD-based pretraining" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_kd_bksdm.png" width="100%">
|
| 102 |
</center>
|
| 103 |
|
| 104 |
|
|
|
|
| 118 |
|
| 119 |
| Model | FID↓ | IS↑ | CLIP Score↑<br>(ViT-g/14) | # Params,<br>U-Net | # Params,<br>Whole SDM |
|
| 120 |
|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 121 |
+
| [Stable Diffusion v1.4](https://huggingface.co/CompVis/stable-diffusion-v1-4) | 13.05 | 36.76 | 0.2958 | 0.86B | 1.04B |
|
| 122 |
+
| [BK-SDM-Base](https://huggingface.co/nota-ai/bk-sdm-base) (Ours) | 15.76 | 33.79 | 0.2878 | 0.58B | 0.76B |
|
| 123 |
+
| [BK-SDM-Small](https://huggingface.co/nota-ai/bk-sdm-small) (Ours) | 16.98 | 31.68 | 0.2677 | 0.49B | 0.66B |
|
| 124 |
+
| [BK-SDM-Tiny](https://huggingface.co/nota-ai/bk-sdm-tiny) (Ours) | 17.12 | 30.09 | 0.2653 | 0.33B | 0.50B |
|
| 125 |
|
| 126 |
<br/>
|
| 127 |
|
| 128 |
The following figure depicts synthesized images with some MS-COCO captions.
|
| 129 |
|
| 130 |
<center>
|
| 131 |
+
<img alt="Visual results" img src="https://netspresso-research-code-release.s3.us-east-2.amazonaws.com/assets-bk-sdm/fig_results.png" width="100%">
|
| 132 |
</center>
|
| 133 |
|
| 134 |
|
| 135 |
<br/>
|
| 136 |
|
| 137 |
|
| 138 |
+
|
| 139 |
# Uses
|
| 140 |
_Note: This section is taken from the [Stable Diffusion v1 model card]( https://huggingface.co/CompVis/stable-diffusion-v1-4) (which was based on the [DALLE-MINI model card](https://huggingface.co/dalle-mini/dalle-mini)) and applies in the same way to BK-SDMs_.
|
| 141 |
|