# Stable Diffusion 2 Stable Diffusion 2 is a text-to-image _latent diffusion_ model built upon the work of the original [Stable Diffusion](https://stability.ai/blog/stable-diffusion-public-release), and it was led by Robin Rombach and Katherine Crowson from [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). *The Stable Diffusion 2.0 release includes robust text-to-image models trained using a brand new text encoder (OpenCLIP), developed by LAION with support from Stability AI, which greatly improves the quality of the generated images compared to earlier V1 releases. The text-to-image models in this release can generate images with default resolutions of both 512x512 pixels and 768x768 pixels. These models are trained on an aesthetic subset of the [LAION-5B dataset](https://laion.ai/blog/laion-5b/) created by the DeepFloyd team at Stability AI, which is then further filtered to remove adult content using [LAION’s NSFW filter](https://openreview.net/forum?id=M3Y74vmsMcY).* For more details about how Stable Diffusion 2 works and how it differs from the original Stable Diffusion, please refer to the official [announcement post](https://stability.ai/blog/stable-diffusion-v2-release). The architecture of Stable Diffusion 2 is more or less identical to the original [Stable Diffusion model](./text2img) so check out it's API documentation for how to use Stable Diffusion 2. We recommend using the [`DPMSolverMultistepScheduler`] as it gives a reasonable speed/quality trade-off and can be run with as little as 20 steps. Stable Diffusion 2 is available for tasks like text-to-image, inpainting, super-resolution, and depth-to-image: | Task | Repository | |-------------------------|---------------------------------------------------------------------------------------------------------------| | text-to-image (512x512) | [stabilityai/stable-diffusion-2-base](https://huggingface.co/stabilityai/stable-diffusion-2-base) | | text-to-image (768x768) | [stabilityai/stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) | | inpainting | [stabilityai/stable-diffusion-2-inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) | | super-resolution | [stable-diffusion-x4-upscaler](https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler) | | depth-to-image | [stabilityai/stable-diffusion-2-depth](https://huggingface.co/stabilityai/stable-diffusion-2-depth) | Here are some examples for how to use Stable Diffusion 2 for each task: > [!TIP] > Make sure to check out the Stable Diffusion [Tips](overview#tips) section to learn how to explore the tradeoff between scheduler speed and quality, and how to reuse pipeline components efficiently! > > If you're interested in using one of the official checkpoints for a task, explore the [CompVis](https://huggingface.co/CompVis) and [Stability AI](https://huggingface.co/stabilityai) Hub organizations! ## Text-to-image ```py from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler import torch repo_id = "stabilityai/stable-diffusion-2-base" pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda") prompt = "High quality photo of an astronaut riding a horse in space" image = pipe(prompt, num_inference_steps=25).images[0] image ``` ## Inpainting ```py import torch from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler from diffusers.utils import load_image, make_image_grid img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" init_image = load_image(img_url).resize((512, 512)) mask_image = load_image(mask_url).resize((512, 512)) repo_id = "stabilityai/stable-diffusion-2-inpainting" pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, variant="fp16") pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda") prompt = "Face of a yellow cat, high resolution, sitting on a park bench" image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=25).images[0] make_image_grid([init_image, mask_image, image], rows=1, cols=3) ``` ## Super-resolution ```py from diffusers import StableDiffusionUpscalePipeline from diffusers.utils import load_image, make_image_grid import torch # load model and scheduler model_id = "stabilityai/stable-diffusion-x4-upscaler" pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16) pipeline = pipeline.to("cuda") # let's download an image url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd2-upscale/low_res_cat.png" low_res_img = load_image(url) low_res_img = low_res_img.resize((128, 128)) prompt = "a white cat" upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0] make_image_grid([low_res_img.resize((512, 512)), upscaled_image.resize((512, 512))], rows=1, cols=2) ``` ## Depth-to-image ```py import torch from diffusers import StableDiffusionDepth2ImgPipeline from diffusers.utils import load_image, make_image_grid pipe = StableDiffusionDepth2ImgPipeline.from_pretrained( "stabilityai/stable-diffusion-2-depth", torch_dtype=torch.float16, ).to("cuda") url = "http://images.cocodataset.org/val2017/000000039769.jpg" init_image = load_image(url) prompt = "two tigers" negative_prompt = "bad, deformed, ugly, bad anotomy" image = pipe(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0] make_image_grid([init_image, image], rows=1, cols=2) ```