Buckets:
| # aMUSEd | |
| aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen. | |
| Amused is a lightweight text to image model based off of the [MUSE](https://huggingface.co/papers/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once. | |
| Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes. | |
| The abstract from the paper is: | |
| *We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.* | |
| | Model | Params | | |
| |-------|--------| | |
| | [amused-256](https://huggingface.co/amused/amused-256) | 603M | | |
| | [amused-512](https://huggingface.co/amused/amused-512) | 608M | | |
| ## AmusedPipeline[[diffusers.AmusedPipeline]] | |
| #### diffusers.AmusedPipeline[[diffusers.AmusedPipeline]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused.py#L50) | |
| __call__diffusers.AmusedPipeline.__call__https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused.py#L83[{"name": "prompt", "val": ": list[str] | str | None = None"}, {"name": "height", "val": ": int | None = None"}, {"name": "width", "val": ": int | None = None"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": str | list[str] | None = None"}, {"name": "num_images_per_prompt", "val": ": int | None = 1"}, {"name": "generator", "val": ": torch._C.Generator | None = None"}, {"name": "latents", "val": ": torch.IntTensor | None = None"}, {"name": "prompt_embeds", "val": ": torch.Tensor | None = None"}, {"name": "encoder_hidden_states", "val": ": torch.Tensor | None = None"}, {"name": "negative_prompt_embeds", "val": ": torch.Tensor | None = None"}, {"name": "negative_encoder_hidden_states", "val": ": torch.Tensor | None = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": dict[str, typing.Any] | None = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": tuple = (0, 0)"}, {"name": "temperature", "val": ": int | tuple[int, int] | list[int] = (2, 0)"}]- **prompt** (`str` or `list[str]`, *optional*) -- | |
| The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| - **height** (`int`, *optional*, defaults to `self.transformer.config.sample_size * self.vae_scale_factor`) -- | |
| The height in pixels of the generated image. | |
| - **width** (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) -- | |
| The width in pixels of the generated image. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 16) -- | |
| The number of denoising steps. More denoising steps usually lead to a higher quality image at the | |
| expense of slower inference. | |
| - **guidance_scale** (`float`, *optional*, defaults to 10.0) -- | |
| A higher guidance scale value encourages the model to generate images closely linked to the text | |
| `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| - **negative_prompt** (`str` or `list[str]`, *optional*) -- | |
| The prompt or prompts to guide what to not include in image generation. If not defined, you need to | |
| pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale 0[ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) or `tuple`If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) is returned, otherwise a | |
| `tuple` is returned where the first element is a list with the generated images. | |
| The call function to the pipeline for generation. | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import AmusedPipeline | |
| >>> pipe = AmusedPipeline.from_pretrained("amused/amused-512", variant="fp16", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> prompt = "a photo of an astronaut riding a horse on mars" | |
| >>> image = pipe(prompt).images[0] | |
| ``` | |
| **Parameters:** | |
| prompt (`str` or `list[str]`, *optional*) : The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| height (`int`, *optional*, defaults to `self.transformer.config.sample_size * self.vae_scale_factor`) : The height in pixels of the generated image. | |
| width (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) : The width in pixels of the generated image. | |
| num_inference_steps (`int`, *optional*, defaults to 16) : The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | |
| guidance_scale (`float`, *optional*, defaults to 10.0) : A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| negative_prompt (`str` or `list[str]`, *optional*) : The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient | |
| attention takes > precedent. | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import DiffusionPipeline | |
| >>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp | |
| >>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp) | |
| >>> # Workaround for not accepting attention shape using VAE for Flash Attention | |
| >>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None) | |
| ``` | |
| **Parameters:** | |
| attention_op (`Callable`, *optional*) : Override the default `None` operator for use as `op` argument to the [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) function of xFormers. | |
| #### disable_xformers_memory_efficient_attention[[diffusers.AmusedPipeline.disable_xformers_memory_efficient_attention]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/pipeline_utils.py#L2016) | |
| Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). | |
| #### diffusers.AmusedImg2ImgPipeline[[diffusers.AmusedImg2ImgPipeline]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L60) | |
| __call__diffusers.AmusedImg2ImgPipeline.__call__https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L98[{"name": "prompt", "val": ": list[str] | str | None = None"}, {"name": "image", "val": ": PIL.Image.Image | numpy.ndarray | torch.Tensor | list[PIL.Image.Image] | list[numpy.ndarray] | list[torch.Tensor] = None"}, {"name": "strength", "val": ": float = 0.5"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": str | list[str] | None = None"}, {"name": "num_images_per_prompt", "val": ": int | None = 1"}, {"name": "generator", "val": ": torch._C.Generator | None = None"}, {"name": "prompt_embeds", "val": ": torch.Tensor | None = None"}, {"name": "encoder_hidden_states", "val": ": torch.Tensor | None = None"}, {"name": "negative_prompt_embeds", "val": ": torch.Tensor | None = None"}, {"name": "negative_encoder_hidden_states", "val": ": torch.Tensor | None = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": dict[str, typing.Any] | None = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": tuple = (0, 0)"}, {"name": "temperature", "val": ": int | tuple[int, int] | list[int] = (2, 0)"}]- **prompt** (`str` or `list[str]`, *optional*) -- | |
| The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| - **image** (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) -- | |
| `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both | |
| numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list | |
| or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a | |
| list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image | |
| latents as `image`, but if passing latents directly it is not encoded again. | |
| - **strength** (`float`, *optional*, defaults to 0.5) -- | |
| Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a | |
| starting point and more noise is added the higher the `strength`. The number of denoising steps depends | |
| on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising | |
| process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 | |
| essentially ignores `image`. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 12) -- | |
| The number of denoising steps. More denoising steps usually lead to a higher quality image at the | |
| expense of slower inference. | |
| - **guidance_scale** (`float`, *optional*, defaults to 10.0) -- | |
| A higher guidance scale value encourages the model to generate images closely linked to the text | |
| `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| - **negative_prompt** (`str` or `list[str]`, *optional*) -- | |
| The prompt or prompts to guide what to not include in image generation. If not defined, you need to | |
| pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale 0[ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) or `tuple`If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) is returned, otherwise a | |
| `tuple` is returned where the first element is a list with the generated images. | |
| The call function to the pipeline for generation. | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import AmusedImg2ImgPipeline | |
| >>> from diffusers.utils import load_image | |
| >>> pipe = AmusedImg2ImgPipeline.from_pretrained( | |
| ... "amused/amused-512", variant="fp16", torch_dtype=torch.float16 | |
| ... ) | |
| >>> pipe = pipe.to("cuda") | |
| >>> prompt = "winter mountains" | |
| >>> input_image = ( | |
| ... load_image( | |
| ... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains.jpg" | |
| ... ) | |
| ... .resize((512, 512)) | |
| ... .convert("RGB") | |
| ... ) | |
| >>> image = pipe(prompt, input_image).images[0] | |
| ``` | |
| **Parameters:** | |
| prompt (`str` or `list[str]`, *optional*) : The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) : `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but if passing latents directly it is not encoded again. | |
| strength (`float`, *optional*, defaults to 0.5) : Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image`. | |
| num_inference_steps (`int`, *optional*, defaults to 12) : The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | |
| guidance_scale (`float`, *optional*, defaults to 10.0) : A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| negative_prompt (`str` or `list[str]`, *optional*) : The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient | |
| attention takes > precedent. | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import DiffusionPipeline | |
| >>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp | |
| >>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp) | |
| >>> # Workaround for not accepting attention shape using VAE for Flash Attention | |
| >>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None) | |
| ``` | |
| **Parameters:** | |
| attention_op (`Callable`, *optional*) : Override the default `None` operator for use as `op` argument to the [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) function of xFormers. | |
| #### disable_xformers_memory_efficient_attention[[diffusers.AmusedImg2ImgPipeline.disable_xformers_memory_efficient_attention]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/pipeline_utils.py#L2016) | |
| Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). | |
| #### diffusers.AmusedInpaintPipeline[[diffusers.AmusedInpaintPipeline]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L68) | |
| __call__diffusers.AmusedInpaintPipeline.__call__https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L114[{"name": "prompt", "val": ": list[str] | str | None = None"}, {"name": "image", "val": ": PIL.Image.Image | numpy.ndarray | torch.Tensor | list[PIL.Image.Image] | list[numpy.ndarray] | list[torch.Tensor] = None"}, {"name": "mask_image", "val": ": PIL.Image.Image | numpy.ndarray | torch.Tensor | list[PIL.Image.Image] | list[numpy.ndarray] | list[torch.Tensor] = None"}, {"name": "strength", "val": ": float = 1.0"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": str | list[str] | None = None"}, {"name": "num_images_per_prompt", "val": ": int | None = 1"}, {"name": "generator", "val": ": torch._C.Generator | None = None"}, {"name": "prompt_embeds", "val": ": torch.Tensor | None = None"}, {"name": "encoder_hidden_states", "val": ": torch.Tensor | None = None"}, {"name": "negative_prompt_embeds", "val": ": torch.Tensor | None = None"}, {"name": "negative_encoder_hidden_states", "val": ": torch.Tensor | None = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": dict[str, typing.Any] | None = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": tuple = (0, 0)"}, {"name": "temperature", "val": ": int | tuple[int, int] | list[int] = (2, 0)"}]- **prompt** (`str` or `list[str]`, *optional*) -- | |
| The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| - **image** (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) -- | |
| `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both | |
| numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list | |
| or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a | |
| list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image | |
| latents as `image`, but if passing latents directly it is not encoded again. | |
| - **mask_image** (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) -- | |
| `Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask | |
| are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a | |
| single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one | |
| color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B, | |
| H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W, | |
| 1)`, or `(H, W)`. | |
| - **strength** (`float`, *optional*, defaults to 1.0) -- | |
| Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a | |
| starting point and more noise is added the higher the `strength`. The number of denoising steps depends | |
| on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising | |
| process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 | |
| essentially ignores `image`. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 16) -- | |
| The number of denoising steps. More denoising steps usually lead to a higher quality image at the | |
| expense of slower inference. | |
| - **guidance_scale** (`float`, *optional*, defaults to 10.0) -- | |
| A higher guidance scale value encourages the model to generate images closely linked to the text | |
| `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| - **negative_prompt** (`str` or `list[str]`, *optional*) -- | |
| The prompt or prompts to guide what to not include in image generation. If not defined, you need to | |
| pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale 0[ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) or `tuple`If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12652/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) is returned, otherwise a | |
| `tuple` is returned where the first element is a list with the generated images. | |
| The call function to the pipeline for generation. | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import AmusedInpaintPipeline | |
| >>> from diffusers.utils import load_image | |
| >>> pipe = AmusedInpaintPipeline.from_pretrained( | |
| ... "amused/amused-512", variant="fp16", torch_dtype=torch.float16 | |
| ... ) | |
| >>> pipe = pipe.to("cuda") | |
| >>> prompt = "fall mountains" | |
| >>> input_image = ( | |
| ... load_image( | |
| ... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1.jpg" | |
| ... ) | |
| ... .resize((512, 512)) | |
| ... .convert("RGB") | |
| ... ) | |
| >>> mask = ( | |
| ... load_image( | |
| ... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1_mask.png" | |
| ... ) | |
| ... .resize((512, 512)) | |
| ... .convert("L") | |
| ... ) | |
| >>> pipe(prompt, input_image, mask).images[0].save("out.png") | |
| ``` | |
| **Parameters:** | |
| prompt (`str` or `list[str]`, *optional*) : The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) : `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image latents as `image`, but if passing latents directly it is not encoded again. | |
| mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `list[torch.Tensor]`, `list[PIL.Image.Image]`, or `list[np.ndarray]`) : `Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B, H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W, 1)`, or `(H, W)`. | |
| strength (`float`, *optional*, defaults to 1.0) : Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a starting point and more noise is added the higher the `strength`. The number of denoising steps depends on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 essentially ignores `image`. | |
| num_inference_steps (`int`, *optional*, defaults to 16) : The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | |
| guidance_scale (`float`, *optional*, defaults to 10.0) : A higher guidance scale value encourages the model to generate images closely linked to the text `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| negative_prompt (`str` or `list[str]`, *optional*) : The prompt or prompts to guide what to not include in image generation. If not defined, you need to pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient | |
| attention takes > precedent. | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import DiffusionPipeline | |
| >>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp | |
| >>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp) | |
| >>> # Workaround for not accepting attention shape using VAE for Flash Attention | |
| >>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None) | |
| ``` | |
| **Parameters:** | |
| attention_op (`Callable`, *optional*) : Override the default `None` operator for use as `op` argument to the [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) function of xFormers. | |
| #### disable_xformers_memory_efficient_attention[[diffusers.AmusedInpaintPipeline.disable_xformers_memory_efficient_attention]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/pipelines/pipeline_utils.py#L2016) | |
| Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). | |
Xet Storage Details
- Size:
- 25.6 kB
- Xet hash:
- 1427c93f64c1529301e2593fd15725990614ce12c9eb432f7a8878406836eba6
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.