Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_12652 /en /api /pipelines /amused.md

rtrm

28 days ago

preview code

download

raw

25.6 kB

	# aMUSEd

	aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen.

	Amused is a lightweight text to image model based off of the [MUSE](https://huggingface.co/papers/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.

	Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

	The abstract from the paper is:

	We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.

	\| Model \| Params \|
	\|-------\|--------\|
	\| [amused-256](https://huggingface.co/amused/amused-256) \| 603M \|
	\| [amused-512](https://huggingface.co/amused/amused-512) \| 608M \|

	## AmusedPipeline[[diffusers.AmusedPipeline]]

	#### diffusers.AmusedPipeline[[diffusers.AmusedPipeline]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused.py#L50)

	__call__diffusers.AmusedPipeline.__call__https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused.py#L83[{"name": "prompt", "val": ": list[str] \| str \| None = None"}, {"name": "height", "val": ": int \| None = None"}, {"name": "width", "val": ": int \| None = None"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": str \| list[str] \| None = None"}, {"name": "num_images_per_prompt", "val": ": int \| None = 1"}, {"name": "generator", "val": ": torch._C.Generator \| None = None"}, {"name": "latents", "val": ": torch.IntTensor \| None = None"}, {"name": "prompt_embeds", "val": ": torch.Tensor \| None = None"}, {"name": "encoder_hidden_states", "val": ": torch.Tensor \| None = None"}, {"name": "negative_prompt_embeds", "val": ": torch.Tensor \| None = None"}, {"name": "negative_encoder_hidden_states", "val": ": torch.Tensor \| None = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": dict[str, typing.Any] \| None = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": tuple = (0, 0)"}, {"name": "temperature", "val": ": int \| tuple[int, int] \| list[int] = (2, 0)"}]- prompt (`str` or `list[str]`, optional) --
	The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
	- height (`int`, optional, defaults to `self.transformer.config.sample_size * self.vae_scale_factor`) --
	The height in pixels of the generated image.
	- width (`int`, optional, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) --
	The width in pixels of the generated image.
	- num_inference_steps (`int`, optional, defaults to 16) --
	The number of denoising steps. More denoising steps usually lead to a higher quality image at the
	expense of slower inference.
	- guidance_scale (`float`, optional, defaults to 10.0) --
	A higher guidance scale value encourages the model to generate images closely linked to the text
	`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
	- negative_prompt (`str` or `list[str]`, optional) --
	The prompt or prompts to guide what to not include in image generation. If not defined, you need to
	pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale 0[ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) or `tuple`If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) is returned, otherwise a
	`tuple` is returned where the first element is a list with the generated images.

	The call function to the pipeline for generation.

	Examples:
	```py
	>>> import torch
	>>> from diffusers import AmusedPipeline

	>>> pipe = AmusedPipeline.from_pretrained("amused/amused-512", variant="fp16", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")

	>>> prompt = "a photo of an astronaut riding a horse on mars"
	>>> image = pipe(prompt).images[0]
	```

	Parameters:

	prompt (`str` or `list[str]`, optional) : The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.

	height (`int`, optional, defaults to `self.transformer.config.sample_size * self.vae_scale_factor`) : The height in pixels of the generated image.

	width (`int`, optional, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) : The width in pixels of the generated image.

	num_inference_steps (`int`, optional, defaults to 16) : The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

	guidance_scale (`float`, optional, defaults to 10.0) : A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.

	negative_prompt (`str` or `list[str]`, optional) : The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient
	attention takes > precedent.

	Examples:

	```py
	>>> import torch
	>>> from diffusers import DiffusionPipeline
	>>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

	>>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")
	>>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
	>>> # Workaround for not accepting attention shape using VAE for Flash Attention
	>>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)
	```

	Parameters:

	attention_op (`Callable`, optional) : Override the default `None` operator for use as `op` argument to the [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) function of xFormers.
	#### disable_xformers_memory_efficient_attention[[diffusers.AmusedPipeline.disable_xformers_memory_efficient_attention]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/pipeline_utils.py#L2016)

	Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).

	#### diffusers.AmusedImg2ImgPipeline[[diffusers.AmusedImg2ImgPipeline]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L60)

	__call__diffusers.AmusedImg2ImgPipeline.__call__https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L98[{"name": "prompt", "val": ": list[str] \| str \| None = None"}, {"name": "image", "val": ": PIL.Image.Image \| numpy.ndarray \| torch.Tensor \| list[PIL.Image.Image] \| list[numpy.ndarray] \| list[torch.Tensor] = None"}, {"name": "strength", "val": ": float = 0.5"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": str \| list[str] \| None = None"}, {"name": "num_images_per_prompt", "val": ": int \| None = 1"}, {"name": "generator", "val": ": torch._C.Generator \| None = None"}, {"name": "prompt_embeds", "val": ": torch.Tensor \| None = None"}, {"name": "encoder_hidden_states", "val": ": torch.Tensor \| None = None"}, {"name": "negative_prompt_embeds", "val": ": torch.Tensor \| None = None"}, {"name": "negative_encoder_hidden_states", "val": ": torch.Tensor \| None = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": dict[str, typing.Any] \| None = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": tuple = (0, 0)"}, {"name": "temperature", "val": ": int \| tuple[int, int] \| list[int] = (2, 0)"}]- prompt (`str` or `list[str]`, optional) --
	The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
	- image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) --
	`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
	numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list
	or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
	list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
	latents as `image`, but if passing latents directly it is not encoded again.
	- strength (`float`, optional, defaults to 0.5) --
	Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a
	starting point and more noise is added the higher the `strength`. The number of denoising steps depends
	on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising
	process runs for the full number of iterations specified in `num_inference_steps`. A value of 1
	essentially ignores `image`.
	- num_inference_steps (`int`, optional, defaults to 12) --
	The number of denoising steps. More denoising steps usually lead to a higher quality image at the
	expense of slower inference.
	- guidance_scale (`float`, optional, defaults to 10.0) --
	A higher guidance scale value encourages the model to generate images closely linked to the text
	`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
	- negative_prompt (`str` or `list[str]`, optional) --
	The prompt or prompts to guide what to not include in image generation. If not defined, you need to
	pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale 0[ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) or `tuple`If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) is returned, otherwise a
	`tuple` is returned where the first element is a list with the generated images.

	The call function to the pipeline for generation.

	Examples:
	```py
	>>> import torch
	>>> from diffusers import AmusedImg2ImgPipeline
	>>> from diffusers.utils import load_image

	>>> pipe = AmusedImg2ImgPipeline.from_pretrained(
	... "amused/amused-512", variant="fp16", torch_dtype=torch.float16
	... )
	>>> pipe = pipe.to("cuda")

	>>> prompt = "winter mountains"
	>>> input_image = (
	... load_image(
	... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains.jpg"
	... )
	... .resize((512, 512))
	... .convert("RGB")
	... )
	>>> image = pipe(prompt, input_image).images[0]
	```

	Parameters:

	prompt (`str` or `list[str]`, optional) : The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.

	image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) : `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but if passing latents directly it is not encoded again.

	strength (`float`, optional, defaults to 0.5) : Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image`.

	num_inference_steps (`int`, optional, defaults to 12) : The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

	guidance_scale (`float`, optional, defaults to 10.0) : A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.

	negative_prompt (`str` or `list[str]`, optional) : The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient
	attention takes > precedent.

	Examples:

	```py
	>>> import torch
	>>> from diffusers import DiffusionPipeline
	>>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

	>>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")
	>>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
	>>> # Workaround for not accepting attention shape using VAE for Flash Attention
	>>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)
	```

	Parameters:

	attention_op (`Callable`, optional) : Override the default `None` operator for use as `op` argument to the [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) function of xFormers.
	#### disable_xformers_memory_efficient_attention[[diffusers.AmusedImg2ImgPipeline.disable_xformers_memory_efficient_attention]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/pipeline_utils.py#L2016)

	Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).

	#### diffusers.AmusedInpaintPipeline[[diffusers.AmusedInpaintPipeline]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L68)

	__call__diffusers.AmusedInpaintPipeline.__call__https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L114[{"name": "prompt", "val": ": list[str] \| str \| None = None"}, {"name": "image", "val": ": PIL.Image.Image \| numpy.ndarray \| torch.Tensor \| list[PIL.Image.Image] \| list[numpy.ndarray] \| list[torch.Tensor] = None"}, {"name": "mask_image", "val": ": PIL.Image.Image \| numpy.ndarray \| torch.Tensor \| list[PIL.Image.Image] \| list[numpy.ndarray] \| list[torch.Tensor] = None"}, {"name": "strength", "val": ": float = 1.0"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": str \| list[str] \| None = None"}, {"name": "num_images_per_prompt", "val": ": int \| None = 1"}, {"name": "generator", "val": ": torch._C.Generator \| None = None"}, {"name": "prompt_embeds", "val": ": torch.Tensor \| None = None"}, {"name": "encoder_hidden_states", "val": ": torch.Tensor \| None = None"}, {"name": "negative_prompt_embeds", "val": ": torch.Tensor \| None = None"}, {"name": "negative_encoder_hidden_states", "val": ": torch.Tensor \| None = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": dict[str, typing.Any] \| None = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": tuple = (0, 0)"}, {"name": "temperature", "val": ": int \| tuple[int, int] \| list[int] = (2, 0)"}]- prompt (`str` or `list[str]`, optional) --
	The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
	- image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) --
	`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
	numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list
	or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
	list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
	latents as `image`, but if passing latents directly it is not encoded again.
	- mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) --
	`Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask
	are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a
	single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one
	color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B,
	H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W,
	1)`, or `(H, W)`.
	- strength (`float`, optional, defaults to 1.0) --
	Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a
	starting point and more noise is added the higher the `strength`. The number of denoising steps depends
	on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising
	process runs for the full number of iterations specified in `num_inference_steps`. A value of 1
	essentially ignores `image`.
	- num_inference_steps (`int`, optional, defaults to 16) --
	The number of denoising steps. More denoising steps usually lead to a higher quality image at the
	expense of slower inference.
	- guidance_scale (`float`, optional, defaults to 10.0) --
	A higher guidance scale value encourages the model to generate images closely linked to the text
	`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
	- negative_prompt (`str` or `list[str]`, optional) --
	The prompt or prompts to guide what to not include in image generation. If not defined, you need to
	pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale 0[ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) or `tuple`If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) is returned, otherwise a
	`tuple` is returned where the first element is a list with the generated images.

	The call function to the pipeline for generation.

	Examples:
	```py
	>>> import torch
	>>> from diffusers import AmusedInpaintPipeline
	>>> from diffusers.utils import load_image

	>>> pipe = AmusedInpaintPipeline.from_pretrained(
	... "amused/amused-512", variant="fp16", torch_dtype=torch.float16
	... )
	>>> pipe = pipe.to("cuda")

	>>> prompt = "fall mountains"
	>>> input_image = (
	... load_image(
	... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1.jpg"
	... )
	... .resize((512, 512))
	... .convert("RGB")
	... )
	>>> mask = (
	... load_image(
	... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1_mask.png"
	... )
	... .resize((512, 512))
	... .convert("L")
	... )
	>>> pipe(prompt, input_image, mask).images[0].save("out.png")
	```

	Parameters:

	prompt (`str` or `list[str]`, optional) : The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.

	image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) : `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but if passing latents directly it is not encoded again.

	mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) : `Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B, H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W, 1)`, or `(H, W)`.

	strength (`float`, optional, defaults to 1.0) : Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image`.

	num_inference_steps (`int`, optional, defaults to 16) : The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

	guidance_scale (`float`, optional, defaults to 10.0) : A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.

	negative_prompt (`str` or `list[str]`, optional) : The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient
	attention takes > precedent.

	Examples:

	```py
	>>> import torch
	>>> from diffusers import DiffusionPipeline
	>>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

	>>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")
	>>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
	>>> # Workaround for not accepting attention shape using VAE for Flash Attention
	>>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)
	```

	Parameters:

	attention_op (`Callable`, optional) : Override the default `None` operator for use as `op` argument to the [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) function of xFormers.
	#### disable_xformers_memory_efficient_attention[[diffusers.AmusedInpaintPipeline.disable_xformers_memory_efficient_attention]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/pipeline_utils.py#L2016)

	Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).

Xet Storage Details

Size:: 25.6 kB
Xet hash:: 1427c93f64c1529301e2593fd15725990614ce12c9eb432f7a8878406836eba6

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.