qualcomm
/

Stable-Diffusion-v2.1

@@ -16,7 +16,7 @@ tags:
 Generates high resolution images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image.
-This model is an implementation of Stable-Diffusion-v2.1 found [here](https://github.com/CompVis/stable-diffusion/tree/main).
 This repository provides scripts to run Stable-Diffusion-v2.1 on Qualcomm® devices.
 More details on model performance across various devices, can be found
 [here](https://aihub.qualcomm.com/models/stable_diffusion_v2_1_quantized).
@@ -32,16 +32,23 @@ More details on model performance across various devices, can be found
   - VAE Decoder Number of parameters: 83M
   - Model size: 1GB
-| Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
-| ---|---|---|---|---|---|---|---|
-| Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Binary | 11.633 ms | 0 - 1 MB | INT8 | NPU |  [TextEncoder_Quantized.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/TextEncoder_Quantized.bin)
-| Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Binary | 217.134 ms | 0 - 2 MB | INT8 | NPU |  [VAEDecoder_Quantized.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/VAEDecoder_Quantized.bin)
-| Samsung Galaxy S23 Ultra (Android 13) | Snapdragon® 8 Gen 2 | QNN Binary | 101.094 ms | 0 - 2 MB | INT8 | NPU |  [UNet_Quantized.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/UNet_Quantized.bin)
 ## Installation
@@ -97,30 +104,34 @@ device. This script does the following:
 ```bash
 python -m qai_hub_models.models.stable_diffusion_v2_1_quantized.export
 ```
 ```
-Profile Job summary of TextEncoder_Quantized
---------------------------------------------------
-Device: QCS8550 (Proxy) (12)
-Estimated Inference Time: 10.70 ms
-Estimated Peak Memory Range: 0.03-1.24 MB
-Compute Units: NPU (1040) | Total (1040)
-Profile Job summary of VAEDecoder_Quantized
---------------------------------------------------
-Device: QCS8550 (Proxy) (12)
-Estimated Inference Time: 225.42 ms
-Estimated Peak Memory Range: 0.40-1.52 MB
-Compute Units: NPU (170) | Total (170)
-Profile Job summary of UNet_Quantized
---------------------------------------------------
-Device: QCS8550 (Proxy) (12)
-Estimated Inference Time: 96.63 ms
-Estimated Peak Memory Range: 0.53-1.92 MB
-Compute Units: NPU (6361) | Total (6361)
 ```
@@ -231,15 +242,19 @@ provides instructions on how to use the `.so` shared library or `.bin` context b
 Get more details on Stable-Diffusion-v2.1's performance across various devices [here](https://aihub.qualcomm.com/models/stable_diffusion_v2_1_quantized).
 Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
 ## License
-- The license for the original implementation of Stable-Diffusion-v2.1 can be found
-  [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE).
-- The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE)
 ## References
 * [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)
 * [Source Model Implementation](https://github.com/CompVis/stable-diffusion/tree/main)
 ## Community
 * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI.
 * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com).

 Generates high resolution images from text prompts using a latent diffusion model. This model uses CLIP ViT-L/14 as text encoder, U-Net based latent denoising, and VAE based decoder to generate the final image.
+This model is an implementation of Stable-Diffusion-v2.1 found [here]({source_repo}).
 This repository provides scripts to run Stable-Diffusion-v2.1 on Qualcomm® devices.
 More details on model performance across various devices, can be found
 [here](https://aihub.qualcomm.com/models/stable_diffusion_v2_1_quantized).
   - VAE Decoder Number of parameters: 83M
   - Model size: 1GB
+| Model | Device | Chipset | Target Runtime | Inference Time (ms) | Peak Memory Range (MB) | Precision | Primary Compute Unit | Target Model
+|---|---|---|---|---|---|---|---|---|
+| TextEncoder_Quantized | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | QNN | 11.633 ms | 0 - 1 MB | INT8 | NPU | [Stable-Diffusion-v2.1.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/TextEncoder_Quantized.bin) |
+| TextEncoder_Quantized | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | QNN | 7.759 ms | 0 - 8 MB | INT8 | NPU | [Stable-Diffusion-v2.1.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/TextEncoder_Quantized.bin) |
+| TextEncoder_Quantized | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 11.773 ms | 0 - 0 MB | INT8 | NPU | Use Export Script |
+| TextEncoder_Quantized | QCS8550 (Proxy) | QCS8550 Proxy | QNN | 10.7 ms | 0 - 1 MB | UINT16 | NPU | Use Export Script |
+| VAEDecoder_Quantized | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | QNN | 217.134 ms | 0 - 2 MB | INT8 | NPU | [Stable-Diffusion-v2.1.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/VAEDecoder_Quantized.bin) |
+| VAEDecoder_Quantized | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | QNN | 161.705 ms | 0 - 8 MB | INT8 | NPU | [Stable-Diffusion-v2.1.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/VAEDecoder_Quantized.bin) |
+| VAEDecoder_Quantized | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 220.179 ms | 0 - 0 MB | INT8 | NPU | Use Export Script |
+| VAEDecoder_Quantized | QCS8550 (Proxy) | QCS8550 Proxy | QNN | 225.416 ms | 0 - 2 MB | UINT16 | NPU | Use Export Script |
+| UNet_Quantized | Samsung Galaxy S23 | Snapdragon® 8 Gen 2 | QNN | 101.094 ms | 0 - 2 MB | INT8 | NPU | [Stable-Diffusion-v2.1.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/UNet_Quantized.bin) |
+| UNet_Quantized | Samsung Galaxy S24 | Snapdragon® 8 Gen 3 | QNN | 72.62 ms | 0 - 8 MB | INT8 | NPU | [Stable-Diffusion-v2.1.bin](https://huggingface.co/qualcomm/Stable-Diffusion-v2.1/blob/main/UNet_Quantized.bin) |
+| UNet_Quantized | Snapdragon X Elite CRD | Snapdragon® X Elite | QNN | 102.486 ms | 0 - 0 MB | INT8 | NPU | Use Export Script |
+| UNet_Quantized | QCS8550 (Proxy) | QCS8550 Proxy | QNN | 96.631 ms | 1 - 2 MB | UINT16 | NPU | Use Export Script |
 ## Installation
 ```bash
 python -m qai_hub_models.models.stable_diffusion_v2_1_quantized.export
 ```
 ```
+Profiling Results
+------------------------------------------------------------
+TextEncoder_Quantized
+Device                          : Samsung Galaxy S23 (13)
+Runtime                         : QNN
+Estimated inference time (ms)   : 11.6
+Estimated peak memory usage (MB): [0, 1]
+Total # Ops                     : 1040
+Compute Unit(s)                 : NPU (1040 ops)
+------------------------------------------------------------
+VAEDecoder_Quantized
+Device                          : Samsung Galaxy S23 (13)
+Runtime                         : QNN
+Estimated inference time (ms)   : 217.1
+Estimated peak memory usage (MB): [0, 2]
+Total # Ops                     : 170
+Compute Unit(s)                 : NPU (170 ops)
+------------------------------------------------------------
+UNet_Quantized
+Device                          : Samsung Galaxy S23 (13)
+Runtime                         : QNN
+Estimated inference time (ms)   : 101.1
+Estimated peak memory usage (MB): [0, 2]
+Total # Ops                     : 6361
+Compute Unit(s)                 : NPU (6361 ops)
 ```
 Get more details on Stable-Diffusion-v2.1's performance across various devices [here](https://aihub.qualcomm.com/models/stable_diffusion_v2_1_quantized).
 Explore all available models on [Qualcomm® AI Hub](https://aihub.qualcomm.com/)
 ## License
+* The license for the original implementation of Stable-Diffusion-v2.1 can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE).
+* The license for the compiled assets for on-device deployment can be found [here](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE)
 ## References
 * [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)
 * [Source Model Implementation](https://github.com/CompVis/stable-diffusion/tree/main)
 ## Community
 * Join [our AI Hub Slack community](https://aihub.qualcomm.com/community/slack) to collaborate, post questions and learn more about on-device AI.
 * For questions or feedback please [reach out to us](mailto:ai-hub-support@qti.qualcomm.com).