Qalam commited on
Commit
f4c6308
·
1 Parent(s): 1f6d86c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +472 -115
README.md CHANGED
@@ -1,206 +1,563 @@
1
- ---
2
- license: openrail
3
- datasets:
4
- - google/MusicCaps
5
- - openwebtext
6
- language:
7
- - en
8
- library_name: open_clip
9
- metrics:
10
- - accuracy
11
- - bertscore
12
- - bleurt
13
- - cer
14
- - chrf
15
- - code_eval
16
- pipeline_tag: text-to-image
17
- ---
18
 
19
- # Model Card for Model ID
 
20
 
21
- <!-- Provide a quick summary of what the model is/does. -->
22
 
23
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
 
 
 
24
 
25
- # Model Details
26
 
27
- ## Model Description
28
 
29
- <!-- Provide a longer summary of what this model is. -->
 
 
 
 
30
 
 
31
 
 
 
 
32
 
33
- - **Developed by:** [More Information Needed]
34
- - **Shared by [optional]:** [More Information Needed]
35
- - **Model type:** [More Information Needed]
36
- - **Language(s) (NLP):** [More Information Needed]
37
- - **License:** [More Information Needed]
38
- - **Finetuned from model [optional]:** [More Information Needed]
39
 
40
- ## Model Sources [optional]
41
 
42
- <!-- Provide the basic links for the model. -->
 
 
43
 
44
- - **Repository:** [More Information Needed]
45
- - **Paper [optional]:** [More Information Needed]
46
- - **Demo [optional]:** [More Information Needed]
47
 
48
- # Uses
49
 
50
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
51
 
52
- ## Direct Use
 
 
 
 
 
53
 
54
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
55
 
56
- [More Information Needed]
57
 
58
- ## Downstream Use [optional]
59
 
60
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
 
 
 
61
 
62
- [More Information Needed]
 
63
 
64
- ## Out-of-Scope Use
65
 
66
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
67
 
68
- [More Information Needed]
69
 
70
- # Bias, Risks, and Limitations
 
 
71
 
72
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
 
73
 
74
- [More Information Needed]
 
 
75
 
76
- ## Recommendations
 
77
 
78
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
79
 
80
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
81
 
82
- ## How to Get Started with the Model
83
 
84
- Use the code below to get started with the model.
 
 
 
85
 
86
- [More Information Needed]
 
87
 
88
- # Training Details
 
 
89
 
90
- ## Training Data
 
 
91
 
92
- <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
93
 
94
- [More Information Needed]
 
 
95
 
96
- ## Training Procedure [optional]
 
 
 
97
 
98
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
 
 
 
99
 
100
- ### Preprocessing
101
 
102
- [More Information Needed]
 
 
 
 
103
 
104
- ### Speeds, Sizes, Times
 
105
 
106
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
107
 
108
- [More Information Needed]
109
 
110
- # Evaluation
 
111
 
112
- <!-- This section describes the evaluation protocols and provides the results. -->
 
 
 
 
113
 
114
- ## Testing Data, Factors & Metrics
115
 
116
- ### Testing Data
117
 
118
- <!-- This should link to a Data Card if possible. -->
119
 
120
- [More Information Needed]
 
 
 
 
121
 
122
- ### Factors
123
 
124
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
 
 
125
 
126
- [More Information Needed]
127
 
128
- ### Metrics
 
129
 
130
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
 
 
 
 
 
131
 
132
- [More Information Needed]
 
 
133
 
134
- ## Results
 
 
 
 
 
 
 
135
 
136
- [More Information Needed]
137
 
138
- ### Summary
 
 
 
 
 
 
 
139
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
 
 
 
 
 
 
 
 
 
 
 
141
 
142
- # Model Examination [optional]
 
 
143
 
144
- <!-- Relevant interpretability work for the model goes here -->
 
 
 
145
 
146
- [More Information Needed]
 
 
 
 
 
147
 
148
- # Environmental Impact
 
 
 
 
 
149
 
150
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
 
 
 
 
 
 
 
 
151
 
152
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
153
 
154
- - **Hardware Type:** [More Information Needed]
155
- - **Hours used:** [More Information Needed]
156
- - **Cloud Provider:** [More Information Needed]
157
- - **Compute Region:** [More Information Needed]
158
- - **Carbon Emitted:** [More Information Needed]
159
 
160
- # Technical Specifications [optional]
 
 
 
 
 
 
 
161
 
162
- ## Model Architecture and Objective
163
 
164
- [More Information Needed]
165
 
166
- ## Compute Infrastructure
 
 
 
 
167
 
168
- [More Information Needed]
 
169
 
170
- ### Hardware
171
 
172
- [More Information Needed]
 
 
173
 
174
- ### Software
 
 
 
 
175
 
176
- [More Information Needed]
177
 
178
- # Citation [optional]
 
 
 
 
 
179
 
180
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
181
 
182
- **BibTeX:**
183
 
184
- [More Information Needed]
185
 
186
- **APA:**
 
 
 
 
187
 
188
- [More Information Needed]
189
 
190
- # Glossary [optional]
 
 
 
191
 
192
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
 
 
193
 
194
- [More Information Needed]
 
195
 
196
- # More Information [optional]
 
 
197
 
198
- [More Information Needed]
199
 
200
- # Model Card Authors [optional]
201
 
202
- [More Information Needed]
 
 
203
 
204
- # Model Card Contact
205
 
206
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <br>
3
+ <img src="./docs/source/en/imgs/diffusers_library.jpg" width="400"/>
4
+ <br>
5
+ <p>
6
+ <p align="center">
7
+ <a href="https://github.com/huggingface/diffusers/blob/main/LICENSE">
8
+ <img alt="GitHub" src="https://img.shields.io/github/license/huggingface/datasets.svg?color=blue">
9
+ </a>
10
+ <a href="https://github.com/huggingface/diffusers/releases">
11
+ <img alt="GitHub release" src="https://img.shields.io/github/release/huggingface/diffusers.svg">
12
+ </a>
13
+ <a href="CODE_OF_CONDUCT.md">
14
+ <img alt="Contributor Covenant" src="https://img.shields.io/badge/Contributor%20Covenant-2.0-4baaaa.svg">
15
+ </a>
16
+ </p>
 
17
 
18
+ 🤗 Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves
19
+ as a modular toolbox for inference and training of diffusion models.
20
 
21
+ More precisely, 🤗 Diffusers offers:
22
 
23
+ - State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). Check [this overview](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/README.md#pipelines-summary) to see all supported pipelines and their corresponding official papers.
24
+ - Various noise schedulers that can be used interchangeably for the preferred speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)).
25
+ - Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)).
26
+ - Training examples to show how to train the most popular diffusion model tasks (see [examples](https://github.com/huggingface/diffusers/tree/main/examples), *e.g.* [unconditional-image-generation](https://github.com/huggingface/diffusers/tree/main/examples/unconditional_image_generation)).
27
 
28
+ ## Installation
29
 
30
+ ### For PyTorch
31
 
32
+ **With `pip`** (official package)
33
+
34
+ ```bash
35
+ pip install --upgrade diffusers[torch]
36
+ ```
37
 
38
+ **With `conda`** (maintained by the community)
39
 
40
+ ```sh
41
+ conda install -c conda-forge diffusers
42
+ ```
43
 
44
+ ### For Flax
 
 
 
 
 
45
 
46
+ **With `pip`**
47
 
48
+ ```bash
49
+ pip install --upgrade diffusers[flax]
50
+ ```
51
 
52
+ **Apple Silicon (M1/M2) support**
 
 
53
 
54
+ Please, refer to [the documentation](https://huggingface.co/docs/diffusers/optimization/mps).
55
 
56
+ ## Contributing
57
 
58
+ We ❤️ contributions from the open-source community!
59
+ If you want to contribute to this library, please check out our [Contribution guide](https://github.com/huggingface/diffusers/blob/main/CONTRIBUTING.md).
60
+ You can look out for [issues](https://github.com/huggingface/diffusers/issues) you'd like to tackle to contribute to the library.
61
+ - See [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) for general opportunities to contribute
62
+ - See [New model/pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) to contribute exciting new diffusion models / diffusion pipelines
63
+ - See [New scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
64
 
65
+ Also, say 👋 in our public Discord channel <a href="https://discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>. We discuss the hottest trends about diffusion models, help each other with contributions, personal projects or
66
+ just hang out ☕.
67
 
68
+ ## Quickstart
69
 
70
+ In order to get started, we recommend taking a look at two notebooks:
71
 
72
+ - The [Getting started with Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines.
73
+ Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library.
74
+ - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffusion models training methods. This notebook takes a step-by-step approach to training your
75
+ diffusion models on an image dataset, with explanatory graphics.
76
+
77
+ ## Stable Diffusion is fully compatible with `diffusers`!
78
 
79
+ Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/), [LAION](https://laion.ai/) and [RunwayML](https://runwayml.com/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 4GB VRAM.
80
+ See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information.
81
 
 
82
 
83
+ ### Text-to-Image generation with Stable Diffusion
84
 
85
+ First let's install
86
 
87
+ ```bash
88
+ pip install --upgrade diffusers transformers accelerate
89
+ ```
90
 
91
+ We recommend using the model in [half-precision (`fp16`)](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) as it gives almost always the same results as full
92
+ precision while being roughly twice as fast and requiring half the amount of GPU RAM.
93
 
94
+ ```python
95
+ import torch
96
+ from diffusers import StableDiffusionPipeline
97
 
98
+ pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
99
+ pipe = pipe.to("cuda")
100
 
101
+ prompt = "a photo of an astronaut riding a horse on mars"
102
+ image = pipe(prompt).images[0]
103
+ ```
104
 
105
+ #### Running the model locally
106
 
107
+ You can also simply download the model folder and pass the path to the local folder to the `StableDiffusionPipeline`.
108
 
109
+ ```
110
+ git lfs install
111
+ git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
112
+ ```
113
 
114
+ Assuming the folder is stored locally under `./stable-diffusion-v1-5`, you can run stable diffusion
115
+ as follows:
116
 
117
+ ```python
118
+ pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
119
+ pipe = pipe.to("cuda")
120
 
121
+ prompt = "a photo of an astronaut riding a horse on mars"
122
+ image = pipe(prompt).images[0]
123
+ ```
124
 
125
+ If you are limited by GPU memory, you might want to consider chunking the attention computation in addition
126
+ to using `fp16`.
127
+ The following snippet should result in less than 4GB VRAM.
128
 
129
+ ```python
130
+ pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
131
+ pipe = pipe.to("cuda")
132
 
133
+ prompt = "a photo of an astronaut riding a horse on mars"
134
+ pipe.enable_attention_slicing()
135
+ image = pipe(prompt).images[0]
136
+ ```
137
 
138
+ If you wish to use a different scheduler (e.g.: DDIM, LMS, PNDM/PLMS), you can instantiate
139
+ it before the pipeline and pass it to `from_pretrained`.
140
+
141
+ ```python
142
+ from diffusers import LMSDiscreteScheduler
143
 
144
+ pipe.scheduler = LMSDiscreteScheduler.from_config(pipe.scheduler.config)
145
 
146
+ prompt = "a photo of an astronaut riding a horse on mars"
147
+ image = pipe(prompt).images[0]
148
+
149
+ image.save("astronaut_rides_horse.png")
150
+ ```
151
 
152
+ If you want to run Stable Diffusion on CPU or you want to have maximum precision on GPU,
153
+ please run the model in the default *full-precision* setting:
154
 
155
+ ```python
156
+ from diffusers import StableDiffusionPipeline
157
 
158
+ pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
159
 
160
+ # disable the following line if you run on CPU
161
+ pipe = pipe.to("cuda")
162
 
163
+ prompt = "a photo of an astronaut riding a horse on mars"
164
+ image = pipe(prompt).images[0]
165
+
166
+ image.save("astronaut_rides_horse.png")
167
+ ```
168
 
169
+ ### JAX/Flax
170
 
171
+ Diffusers offers a JAX / Flax implementation of Stable Diffusion for very fast inference. JAX shines specially on TPU hardware because each TPU server has 8 accelerators working in parallel, but it runs great on GPUs too.
172
 
173
+ Running the pipeline with the default PNDMScheduler:
174
 
175
+ ```python
176
+ import jax
177
+ import numpy as np
178
+ from flax.jax_utils import replicate
179
+ from flax.training.common_utils import shard
180
 
181
+ from diffusers import FlaxStableDiffusionPipeline
182
 
183
+ pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
184
+ "runwayml/stable-diffusion-v1-5", revision="flax", dtype=jax.numpy.bfloat16
185
+ )
186
 
187
+ prompt = "a photo of an astronaut riding a horse on mars"
188
 
189
+ prng_seed = jax.random.PRNGKey(0)
190
+ num_inference_steps = 50
191
 
192
+ num_samples = jax.device_count()
193
+ prompt = num_samples * [prompt]
194
+ prompt_ids = pipeline.prepare_inputs(prompt)
195
+
196
+ # shard inputs and rng
197
+ params = replicate(params)
198
+ prng_seed = jax.random.split(prng_seed, jax.device_count())
199
+ prompt_ids = shard(prompt_ids)
200
 
201
+ images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
202
+ images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
203
+ ```
204
 
205
+ **Note**:
206
+ If you are limited by TPU memory, please make sure to load the `FlaxStableDiffusionPipeline` in `bfloat16` precision instead of the default `float32` precision as done above. You can do so by telling diffusers to load the weights from "bf16" branch.
207
+
208
+ ```python
209
+ import jax
210
+ import numpy as np
211
+ from flax.jax_utils import replicate
212
+ from flax.training.common_utils import shard
213
 
214
+ from diffusers import FlaxStableDiffusionPipeline
215
 
216
+ pipeline, params = FlaxStableDiffusionPipeline.from_pretrained(
217
+ "runwayml/stable-diffusion-v1-5", revision="bf16", dtype=jax.numpy.bfloat16
218
+ )
219
+
220
+ prompt = "a photo of an astronaut riding a horse on mars"
221
+
222
+ prng_seed = jax.random.PRNGKey(0)
223
+ num_inference_steps = 50
224
 
225
+ num_samples = jax.device_count()
226
+ prompt = num_samples * [prompt]
227
+ prompt_ids = pipeline.prepare_inputs(prompt)
228
+
229
+ # shard inputs and rng
230
+ params = replicate(params)
231
+ prng_seed = jax.random.split(prng_seed, jax.device_count())
232
+ prompt_ids = shard(prompt_ids)
233
+
234
+ images = pipeline(prompt_ids, params, prng_seed, num_inference_steps, jit=True).images
235
+ images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
236
+ ```
237
 
238
+ Diffusers also has a Image-to-Image generation pipeline with Flax/Jax
239
+ ```python
240
+ import jax
241
+ import numpy as np
242
+ import jax.numpy as jnp
243
+ from flax.jax_utils import replicate
244
+ from flax.training.common_utils import shard
245
+ import requests
246
+ from io import BytesIO
247
+ from PIL import Image
248
+ from diffusers import FlaxStableDiffusionImg2ImgPipeline
249
 
250
+ def create_key(seed=0):
251
+ return jax.random.PRNGKey(seed)
252
+ rng = create_key(0)
253
 
254
+ url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
255
+ response = requests.get(url)
256
+ init_img = Image.open(BytesIO(response.content)).convert("RGB")
257
+ init_img = init_img.resize((768, 512))
258
 
259
+ prompts = "A fantasy landscape, trending on artstation"
260
+
261
+ pipeline, params = FlaxStableDiffusionImg2ImgPipeline.from_pretrained(
262
+ "CompVis/stable-diffusion-v1-4", revision="flax",
263
+ dtype=jnp.bfloat16,
264
+ )
265
 
266
+ num_samples = jax.device_count()
267
+ rng = jax.random.split(rng, jax.device_count())
268
+ prompt_ids, processed_image = pipeline.prepare_inputs(prompt=[prompts]*num_samples, image = [init_img]*num_samples)
269
+ p_params = replicate(params)
270
+ prompt_ids = shard(prompt_ids)
271
+ processed_image = shard(processed_image)
272
 
273
+ output = pipeline(
274
+ prompt_ids=prompt_ids,
275
+ image=processed_image,
276
+ params=p_params,
277
+ prng_seed=rng,
278
+ strength=0.75,
279
+ num_inference_steps=50,
280
+ jit=True,
281
+ height=512,
282
+ width=768).images
283
 
284
+ output_images = pipeline.numpy_to_pil(np.asarray(output.reshape((num_samples,) + output.shape[-3:])))
285
+ ```
286
 
287
+ Diffusers also has a Text-guided inpainting pipeline with Flax/Jax
 
 
 
 
288
 
289
+ ```python
290
+ import jax
291
+ import numpy as np
292
+ from flax.jax_utils import replicate
293
+ from flax.training.common_utils import shard
294
+ import PIL
295
+ import requests
296
+ from io import BytesIO
297
 
 
298
 
299
+ from diffusers import FlaxStableDiffusionInpaintPipeline
300
 
301
+ def download_image(url):
302
+ response = requests.get(url)
303
+ return PIL.Image.open(BytesIO(response.content)).convert("RGB")
304
+ img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
305
+ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
306
 
307
+ init_image = download_image(img_url).resize((512, 512))
308
+ mask_image = download_image(mask_url).resize((512, 512))
309
 
310
+ pipeline, params = FlaxStableDiffusionInpaintPipeline.from_pretrained("xvjiarui/stable-diffusion-2-inpainting")
311
 
312
+ prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
313
+ prng_seed = jax.random.PRNGKey(0)
314
+ num_inference_steps = 50
315
 
316
+ num_samples = jax.device_count()
317
+ prompt = num_samples * [prompt]
318
+ init_image = num_samples * [init_image]
319
+ mask_image = num_samples * [mask_image]
320
+ prompt_ids, processed_masked_images, processed_masks = pipeline.prepare_inputs(prompt, init_image, mask_image)
321
 
 
322
 
323
+ # shard inputs and rng
324
+ params = replicate(params)
325
+ prng_seed = jax.random.split(prng_seed, jax.device_count())
326
+ prompt_ids = shard(prompt_ids)
327
+ processed_masked_images = shard(processed_masked_images)
328
+ processed_masks = shard(processed_masks)
329
 
330
+ images = pipeline(prompt_ids, processed_masks, processed_masked_images, params, prng_seed, num_inference_steps, jit=True).images
331
+ images = pipeline.numpy_to_pil(np.asarray(images.reshape((num_samples,) + images.shape[-3:])))
332
+ ```
333
 
334
+ ### Image-to-Image text-guided generation with Stable Diffusion
335
 
336
+ The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images.
337
 
338
+ ```python
339
+ import requests
340
+ import torch
341
+ from PIL import Image
342
+ from io import BytesIO
343
 
344
+ from diffusers import StableDiffusionImg2ImgPipeline
345
 
346
+ # load the pipeline
347
+ device = "cuda"
348
+ model_id_or_path = "runwayml/stable-diffusion-v1-5"
349
+ pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)
350
 
351
+ # or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
352
+ # and pass `model_id_or_path="./stable-diffusion-v1-5"`.
353
+ pipe = pipe.to(device)
354
 
355
+ # let's download an initial image
356
+ url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"
357
 
358
+ response = requests.get(url)
359
+ init_image = Image.open(BytesIO(response.content)).convert("RGB")
360
+ init_image = init_image.resize((768, 512))
361
 
362
+ prompt = "A fantasy landscape, trending on artstation"
363
 
364
+ images = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images
365
 
366
+ images[0].save("fantasy_landscape.png")
367
+ ```
368
+ You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb)
369
 
370
+ ### In-painting using Stable Diffusion
371
 
372
+ The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and a text prompt.
373
+
374
+ ```python
375
+ import PIL
376
+ import requests
377
+ import torch
378
+ from io import BytesIO
379
+
380
+ from diffusers import StableDiffusionInpaintPipeline
381
+
382
+ def download_image(url):
383
+ response = requests.get(url)
384
+ return PIL.Image.open(BytesIO(response.content)).convert("RGB")
385
+
386
+ img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
387
+ mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
388
+
389
+ init_image = download_image(img_url).resize((512, 512))
390
+ mask_image = download_image(mask_url).resize((512, 512))
391
+
392
+ pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting", torch_dtype=torch.float16)
393
+ pipe = pipe.to("cuda")
394
+
395
+ prompt = "Face of a yellow cat, high resolution, sitting on a park bench"
396
+ image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]
397
+ ```
398
+
399
+ ### Tweak prompts reusing seeds and latents
400
+
401
+ You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked.
402
+ Please have a look at [Reusing seeds for deterministic generation](https://huggingface.co/docs/diffusers/main/en/using-diffusers/reusing_seeds).
403
+
404
+ ## Fine-Tuning Stable Diffusion
405
+
406
+ Fine-tuning techniques make it possible to adapt Stable Diffusion to your own dataset, or add new subjects to it. These are some of the techniques supported in `diffusers`:
407
+
408
+ Textual Inversion is a technique for capturing novel concepts from a small number of example images in a way that can later be used to control text-to-image pipelines. It does so by learning new 'words' in the embedding space of the pipeline's text encoder. These special words can then be used within text prompts to achieve very fine-grained control of the resulting images.
409
+
410
+ - Textual Inversion. Capture novel concepts from a small set of sample images, and associate them with new "words" in the embedding space of the text encoder. Please, refer to [our training examples](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion) or [documentation](https://huggingface.co/docs/diffusers/training/text_inversion) to try for yourself.
411
+
412
+ - Dreambooth. Another technique to capture new concepts in Stable Diffusion. This method fine-tunes the UNet (and, optionally, also the text encoder) of the pipeline to achieve impressive results. Please, refer to [our training example](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) and [training report](https://huggingface.co/blog/dreambooth) for additional details and training recommendations.
413
+
414
+ - Full Stable Diffusion fine-tuning. If you have a more sizable dataset with a specific look or style, you can fine-tune Stable Diffusion so that it outputs images following those examples. This was the approach taken to create [a Pokémon Stable Diffusion model](https://huggingface.co/justinpinkney/pokemon-stable-diffusion) (by Justing Pinkney / Lambda Labs), [a Japanese specific version of Stable Diffusion](https://huggingface.co/spaces/rinna/japanese-stable-diffusion) (by [Rinna Co.](https://github.com/rinnakk/japanese-stable-diffusion/) and others. You can start at [our text-to-image fine-tuning example](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image) and go from there.
415
+
416
+
417
+ ## Stable Diffusion Community Pipelines
418
+
419
+ The release of Stable Diffusion as an open source model has fostered a lot of interesting ideas and experimentation.
420
+ Our [Community Examples folder](https://github.com/huggingface/diffusers/tree/main/examples/community) contains many ideas worth exploring, like interpolating to create animated videos, using CLIP Guidance for additional prompt fidelity, term weighting, and much more! [Take a look](https://huggingface.co/docs/diffusers/using-diffusers/custom_pipeline_overview) and [contribute your own](https://huggingface.co/docs/diffusers/using-diffusers/contribute_pipeline).
421
+
422
+ ## Other Examples
423
+
424
+ There are many ways to try running Diffusers! Here we outline code-focused tools (primarily using `DiffusionPipeline`s and Google Colab) and interactive web-tools.
425
+
426
+ ### Running Code
427
+
428
+ If you want to run the code yourself 💻, you can try out:
429
+ - [Text-to-Image Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256)
430
+ ```python
431
+ # !pip install diffusers["torch"] transformers
432
+ from diffusers import DiffusionPipeline
433
+
434
+ device = "cuda"
435
+ model_id = "CompVis/ldm-text2im-large-256"
436
+
437
+ # load model and scheduler
438
+ ldm = DiffusionPipeline.from_pretrained(model_id)
439
+ ldm = ldm.to(device)
440
+
441
+ # run pipeline in inference (sample random noise and denoise)
442
+ prompt = "A painting of a squirrel eating a burger"
443
+ image = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6).images[0]
444
+
445
+ # save image
446
+ image.save("squirrel.png")
447
+ ```
448
+ - [Unconditional Diffusion with discrete scheduler](https://huggingface.co/google/ddpm-celebahq-256)
449
+ ```python
450
+ # !pip install diffusers["torch"]
451
+ from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline
452
+
453
+ model_id = "google/ddpm-celebahq-256"
454
+ device = "cuda"
455
+
456
+ # load model and scheduler
457
+ ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference
458
+ ddpm.to(device)
459
+
460
+ # run pipeline in inference (sample random noise and denoise)
461
+ image = ddpm().images[0]
462
+
463
+ # save image
464
+ image.save("ddpm_generated_image.png")
465
+ ```
466
+ - [Unconditional Latent Diffusion](https://huggingface.co/CompVis/ldm-celebahq-256)
467
+ - [Unconditional Diffusion with continuous scheduler](https://huggingface.co/google/ncsnpp-ffhq-1024)
468
+
469
+ **Other Image Notebooks**:
470
+ * [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
471
+ * [tweak images via repeated Stable Diffusion seeds](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
472
+
473
+ **Diffusers for Other Modalities**:
474
+ * [Molecule conformation generation](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/geodiff_molecule_conformation.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
475
+ * [Model-based reinforcement learning](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/reinforcement_learning_with_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg),
476
+
477
+ ### Web Demos
478
+ If you just want to play around with some web demos, you can try out the following 🚀 Spaces:
479
+ | Model | Hugging Face Spaces |
480
+ |-------------------------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
481
+ | Text-to-Image Latent Diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/text2img-latent-diffusion) |
482
+ | Faces generator | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/celeba-latent-diffusion) |
483
+ | DDPM with different schedulers | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/fusing/celeba-diffusion) |
484
+ | Conditional generation from sketch | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/huggingface/diffuse-the-rest) |
485
+ | Composable diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Shuang59/Composable-Diffusion) |
486
+
487
+ ## Definitions
488
+
489
+ **Models**: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to *denoise* a noisy input to an image.
490
+ *Examples*: UNet, Conditioned UNet, 3D UNet, Transformer UNet
491
+
492
+ <p align="center">
493
+ <img src="https://user-images.githubusercontent.com/10695622/174349667-04e9e485-793b-429a-affe-096e8199ad5b.png" width="800"/>
494
+ <br>
495
+ <em> Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
496
+ <p>
497
+
498
+ **Schedulers**: Algorithm class for both **inference** and **training**.
499
+ The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training. Also known as **Samplers**.
500
+ *Examples*: [DDPM](https://arxiv.org/abs/2006.11239), [DDIM](https://arxiv.org/abs/2010.02502), [PNDM](https://arxiv.org/abs/2202.09778), [DEIS](https://arxiv.org/abs/2204.13902)
501
+
502
+ <p align="center">
503
+ <img src="https://user-images.githubusercontent.com/10695622/174349706-53d58acc-a4d1-4cda-b3e8-432d9dc7ad38.png" width="800"/>
504
+ <br>
505
+ <em> Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239). </em>
506
+ <p>
507
+
508
+
509
+ **Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ...
510
+ *Examples*: Glide, Latent-Diffusion, Imagen, DALL-E 2
511
+
512
+ <p align="center">
513
+ <img src="https://user-images.githubusercontent.com/10695622/174348898-481bd7c2-5457-4830-89bc-f0907756f64c.jpeg" width="550"/>
514
+ <br>
515
+ <em> Figure from ImageGen (https://imagen.research.google/). </em>
516
+ <p>
517
+
518
+ ## Philosophy
519
+
520
+ - Readability and clarity is preferred over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper.
521
+ - Diffusers is **modality independent** and focuses on providing pretrained models and tools to build systems that generate **continuous outputs**, *e.g.* vision and audio.
522
+ - Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of another library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion).
523
+
524
+ ## In the works
525
+
526
+ For the first release, 🤗 Diffusers focuses on text-to-image diffusion techniques. However, diffusers can be used for much more than that! Over the upcoming releases, we'll be focusing on:
527
+
528
+ - Diffusers for audio
529
+ - Diffusers for reinforcement learning (initial work happening in https://github.com/huggingface/diffusers/pull/105).
530
+ - Diffusers for video generation
531
+ - Diffusers for molecule generation (initial work happening in https://github.com/huggingface/diffusers/pull/54)
532
+
533
+ A few pipeline components are already being worked on, namely:
534
+
535
+ - BDDMPipeline for spectrogram-to-sound vocoding
536
+ - GLIDEPipeline to support OpenAI's GLIDE model
537
+ - Grad-TTS for text to audio generation / conditional audio generation
538
+
539
+ We want diffusers to be a toolbox useful for diffusers models in general; if you find yourself limited in any way by the current API, or would like to see additional models, schedulers, or techniques, please open a [GitHub issue](https://github.com/huggingface/diffusers/issues) mentioning what you would like to see.
540
+
541
+ ## Credits
542
+
543
+ This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:
544
+
545
+ - @CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion)
546
+ - @hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by @pesser, available [here](https://github.com/pesser/pytorch_diffusion)
547
+ - @ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim).
548
+ - @yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch)
549
+
550
+ We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models) as well as @crowsonkb and @rromb for useful discussions and insights.
551
+
552
+ ## Citation
553
+
554
+ ```bibtex
555
+ @misc{von-platen-etal-2022-diffusers,
556
+ author = {Patrick von Platen and Suraj Patil and Anton Lozhkov and Pedro Cuenca and Nathan Lambert and Kashif Rasul and Mishig Davaadorj and Thomas Wolf},
557
+ title = {Diffusers: State-of-the-art diffusion models},
558
+ year = {2022},
559
+ publisher = {GitHub},
560
+ journal = {GitHub repository},
561
+ howpublished = {\url{https://github.com/huggingface/diffusers}}
562
+ }
563
+ ```