bconsolvo commited on
Commit
5f081b6
Β·
1 Parent(s): fdf6aae

readme updates

Browse files
Files changed (1) hide show
  1. README.md +43 -21
README.md CHANGED
@@ -1,21 +1,26 @@
1
  ---
2
  language: en
 
3
  tags:
4
  - stable-diffusion
5
  - stable-diffusion-diffusers
6
  - text-to-image
 
 
 
 
7
  inference: true
8
  ---
9
- # Stable Diffusion 1.5 on AMD NPU
10
 
11
- "Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.
12
- For more information about how Stable Diffusion functions, please have a look at [πŸ€—'s Stable Diffusion blog](https://huggingface.co/blog/stable_diffusion)". (Original Hugging Face model card: [stable-diffusion-v1-5/stable-diffusion-v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)).
13
 
14
- This repository contains the ONNX models required to run the image generation pipeline on AMD NPUs.
15
 
16
- The folder structure is organized to mirror the main components of the diffusion pipeline (scheduler, text encoder, tokenizer, UNet, and VAE decoder).
17
 
18
- ## Repository structure
19
 
20
  ```text
21
  β”œβ”€ scheduler/
@@ -24,27 +29,44 @@ The folder structure is organized to mirror the main components of the diffusion
24
  β”œβ”€ unet/
25
  └─ vae_decoder/
26
  ```
27
- ### `scheduler/`
28
- This folder contains the scheduler configuration (timesteps, betas, alphas, etc.) used during the diffusion sampling process.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
- ### `text_encoder/`
31
- This folder contains the text encoder model used to convert the input prompt into conditioning embeddings for the diffusion model.
32
 
33
- ### `tokenizer/`
34
- This folder contains the tokenizer configuration and vocabulary files required to preprocess the text prompt before it is fed to the text encoder.
35
 
36
- ### `unet/`
37
- This folder contains the UNet model used in the diffusion process.
38
- The UNet is exported and structured specifically to leverage the AMD NPU accelerator for the denoising steps.
39
 
40
- ### `vae_decoder/`
41
- This folder contains the VAE decoder model used to map latent representations back to the image space.
42
- The VAE decoder is also structured to make use of the NPU accelerator for efficient image reconstruction.
43
 
44
- ## Notes
45
 
46
- - UNet and VAE decoder models are optimized and structured to run on AMD NPUs.
47
- - The other components (text encoder, tokenizer and scheduler) are shared between GPU and NPU pipelines, but are provided here for completeness.
48
 
49
  ```bibtex
50
  @InProceedings{Rombach_2022_CVPR,
 
1
  ---
2
  language: en
3
+ license: creativeml-openrail-m
4
  tags:
5
  - stable-diffusion
6
  - stable-diffusion-diffusers
7
  - text-to-image
8
+ - RyzenAI
9
+ - Quantization
10
+ - ONNX
11
+ - Computer Vision
12
  inference: true
13
  ---
14
+ # πŸš€ Stable Diffusion 1.5 on AMD AI PC NPU
15
 
16
+ "Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. For more information about how Stable Diffusion functions, please have a look at [πŸ€—'s Stable Diffusion blog](https://huggingface.co/blog/stable_diffusion)".
17
+ More details about this model can be found on the original Hugging Face model card: [stable-diffusion-v1-5/stable-diffusion-v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5).
18
 
19
+ This model repo contains the optimized ONNX models required to run the image generation pipeline for Stable Diffusion 1.5 on AMD NPUs.
20
 
21
+ ## Model Details
22
 
23
+ The folder structure is organized to mirror the main components of the diffusion pipeline (scheduler, text encoder, tokenizer, UNet, and VAE decoder).
24
 
25
  ```text
26
  β”œβ”€ scheduler/
 
29
  β”œβ”€ unet/
30
  └─ vae_decoder/
31
  ```
32
+ The [scheduler](scheduler) folder contains the scheduler configuration (timesteps, betas, alphas, etc.) used during the diffusion sampling process.
33
+
34
+ The [text_encoder](text_encoder) folder contains the text encoder model used to convert the input prompt into conditioning embeddings for the diffusion model.
35
+
36
+ The [tokenizer](tokenizer) contains the tokenizer configuration and vocabulary files required to preprocess the text prompt before it is fed to the text encoder.
37
+
38
+ The [unet](unet) folder contains the UNet model used in the diffusion process. The UNet is exported and structured specifically to leverage the AMD NPU accelerator for the denoising steps.
39
+
40
+ The [vae_decoder](vae_decoder) folder contains the VAE decoder model used to map latent representations back to the image space. The VAE decoder is also structured to make use of the NPU accelerator for efficient image reconstruction.
41
+
42
+ > Note: UNet and VAE decoder models are optimized and structured to run on AMD NPUs. The other components (text encoder, tokenizer and scheduler) are shared between GPU and NPU pipelines, but are provided here for completeness.
43
+
44
+ | Model Details | Description |
45
+ | ----------- | ----------- |
46
+ | Person or organization developing model | [Giovanni Guasti (AMD)](https://huggingface.co/gguasti), [Benjamin Consolvo (AMD)](https://huggingface.co/bconsolvo) |
47
+ | Original model authors | [Robin Rombach](https://huggingface.co/rromb), [Patrick Esser](https://huggingface.co/pesser) |
48
+ | Model date | January 2026 |
49
+ | Model version | 1.7.0 |
50
+ | Model type | Diffusion-based text-to-image generation model |
51
+ | Information about training algorithms, parameters, fairness constraints or other applied approaches, and features | This is a model that can be used to generate and modify images based on text prompts. It is a [Latent Diffusion Model](https://arxiv.org/abs/2112.10752) that uses a fixed, pretrained text encoder ([CLIP ViT-L/14](https://arxiv.org/abs/2103.00020)) as suggested in the [Imagen paper](https://arxiv.org/abs/2205.11487). |
52
+ | License | [CreativeML OpenRAIL-M](LICENSE) |
53
+ | Where to send questions or comments about the model | [Community Tab](https://hf.co/amd/stable-diffusion-1.5-amdnpu/discussions) and [AMD Developer Community Discord](https://discord.gg/amd-dev) |
54
+
55
+ ## ⚑ Intended Use
56
+
57
+ #### Getting Started
58
 
59
+ A comprehensive set of documentation is kept on the AMD SD-Sandbox GitHub repository here: [github.com/amd/sd-sandbox](https://github.com/amd/sd-sandbox). Please refer to that for how to get started with this model.
 
60
 
61
+ ## βš“ Ethical Considerations
 
62
 
63
+ AMD is committed to conducting our business in a fair, ethical and honest manner and in compliance with all applicable laws, rules and regulations. You can find out more at the [AMD Ethics and Compliance](https://www.amd.com/en/corporate/corporate-responsibility/ethics-and-compliance.html) page.
 
 
64
 
65
+ ## ⚠️ Caveats and Recommendations
 
 
66
 
67
+ Please visit the original model card for more details: [stable-diffusion-v1-5/stable-diffusion-v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5).
68
 
69
+ ## πŸ“Œ Citation Details
 
70
 
71
  ```bibtex
72
  @InProceedings{Rombach_2022_CVPR,