Buckets:
| # aMUSEd | |
| aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen. | |
| Amused is a lightweight text to image model based off of the [MUSE](https://huggingface.co/papers/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once. | |
| Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes. | |
| The abstract from the paper is: | |
| *We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.* | |
| | Model | Params | | |
| |-------|--------| | |
| | [amused-256](https://huggingface.co/amused/amused-256) | 603M | | |
| | [amused-512](https://huggingface.co/amused/amused-512) | 608M | | |
| ## AmusedPipeline[[diffusers.AmusedPipeline]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AmusedPipeline</name><anchor>diffusers.AmusedPipeline</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused.py#L50</source><parameters>[{"name": "vqvae", "val": ": VQModel"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "text_encoder", "val": ": CLIPTextModelWithProjection"}, {"name": "transformer", "val": ": UVit2DModel"}, {"name": "scheduler", "val": ": AmusedScheduler"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__call__</name><anchor>diffusers.AmusedPipeline.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused.py#L83</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "height", "val": ": typing.Optional[int] = None"}, {"name": "width", "val": ": typing.Optional[int] = None"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "num_images_per_prompt", "val": ": typing.Optional[int] = 1"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "latents", "val": ": typing.Optional[torch.IntTensor] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "temperature", "val": ": typing.Union[int, typing.Tuple[int, int], typing.List[int]] = (2, 0)"}]</parameters><paramsdesc>- **prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| - **height** (`int`, *optional*, defaults to `self.transformer.config.sample_size * self.vae_scale_factor`) -- | |
| The height in pixels of the generated image. | |
| - **width** (`int`, *optional*, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) -- | |
| The width in pixels of the generated image. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 16) -- | |
| The number of denoising steps. More denoising steps usually lead to a higher quality image at the | |
| expense of slower inference. | |
| - **guidance_scale** (`float`, *optional*, defaults to 10.0) -- | |
| A higher guidance scale value encourages the model to generate images closely linked to the text | |
| `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide what to not include in image generation. If not defined, you need to | |
| pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`). | |
| - **num_images_per_prompt** (`int`, *optional*, defaults to 1) -- | |
| The number of images to generate per prompt. | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make | |
| generation deterministic. | |
| - **latents** (`torch.IntTensor`, *optional*) -- | |
| Pre-generated tokens representing latent vectors in `self.vqvae`, to be used as inputs for image | |
| generation. If not provided, the starting latents will be completely masked. | |
| - **prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not | |
| provided, text embeddings are generated from the `prompt` input argument. A single vector from the | |
| pooled and projected final hidden states. | |
| - **encoder_hidden_states** (`torch.Tensor`, *optional*) -- | |
| Pre-generated penultimate hidden states from the text encoder providing additional text conditioning. | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If | |
| not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. | |
| - **negative_encoder_hidden_states** (`torch.Tensor`, *optional*) -- | |
| Analogous to `encoder_hidden_states` for the positive prompt. | |
| - **output_type** (`str`, *optional*, defaults to `"pil"`) -- | |
| The output format of the generated image. Choose between `PIL.Image` or `np.array`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a [StableDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) instead of a | |
| plain tuple. | |
| - **callback** (`Callable`, *optional*) -- | |
| A function that calls every `callback_steps` steps during inference. The function is called with the | |
| following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`. | |
| - **callback_steps** (`int`, *optional*, defaults to 1) -- | |
| The frequency at which the `callback` function is called. If not specified, the callback is called at | |
| every step. | |
| - **cross_attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in | |
| [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). | |
| - **micro_conditioning_aesthetic_score** (`int`, *optional*, defaults to 6) -- | |
| The targeted aesthetic score according to the laion aesthetic classifier. See | |
| https://laion.ai/blog/laion-aesthetics/ and the micro-conditioning section of | |
| https://huggingface.co/papers/2307.01952. | |
| - **micro_conditioning_crop_coord** (`Tuple[int]`, *optional*, defaults to (0, 0)) -- | |
| The targeted height, width crop coordinates. See the micro-conditioning section of | |
| https://huggingface.co/papers/2307.01952. | |
| - **temperature** (`Union[int, Tuple[int, int], List[int]]`, *optional*, defaults to (2, 0)) -- | |
| Configures the temperature scheduler on `self.scheduler` see `AmusedScheduler#set_timesteps`.</paramsdesc><paramgroups>0</paramgroups><rettype>[ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) or `tuple`</rettype><retdesc>If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) is returned, otherwise a | |
| `tuple` is returned where the first element is a list with the generated images.</retdesc></docstring> | |
| The call function to the pipeline for generation. | |
| <ExampleCodeBlock anchor="diffusers.AmusedPipeline.__call__.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import AmusedPipeline | |
| >>> pipe = AmusedPipeline.from_pretrained("amused/amused-512", variant="fp16", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> prompt = "a photo of an astronaut riding a horse on mars" | |
| >>> image = pipe(prompt).images[0] | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedPipeline.enable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1921</source><parameters>[{"name": "attention_op", "val": ": typing.Optional[typing.Callable] = None"}]</parameters><paramsdesc>- **attention_op** (`Callable`, *optional*) -- | |
| Override the default `None` operator for use as `op` argument to the | |
| [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) | |
| function of xFormers.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Enable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). When this | |
| option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed | |
| up during training is not guaranteed. | |
| > [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient | |
| attention takes > precedent. | |
| <ExampleCodeBlock anchor="diffusers.AmusedPipeline.enable_xformers_memory_efficient_attention.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import DiffusionPipeline | |
| >>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp | |
| >>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp) | |
| >>> # Workaround for not accepting attention shape using VAE for Flash Attention | |
| >>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedPipeline.disable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1952</source><parameters>[]</parameters></docstring> | |
| Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). | |
| </div></div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AmusedImg2ImgPipeline</name><anchor>diffusers.AmusedImg2ImgPipeline</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L60</source><parameters>[{"name": "vqvae", "val": ": VQModel"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "text_encoder", "val": ": CLIPTextModelWithProjection"}, {"name": "transformer", "val": ": UVit2DModel"}, {"name": "scheduler", "val": ": AmusedScheduler"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__call__</name><anchor>diffusers.AmusedImg2ImgPipeline.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L98</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None"}, {"name": "strength", "val": ": float = 0.5"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "num_images_per_prompt", "val": ": typing.Optional[int] = 1"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "temperature", "val": ": typing.Union[int, typing.Tuple[int, int], typing.List[int]] = (2, 0)"}]</parameters><paramsdesc>- **prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| - **image** (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) -- | |
| `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both | |
| numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list | |
| or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a | |
| list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image | |
| latents as `image`, but if passing latents directly it is not encoded again. | |
| - **strength** (`float`, *optional*, defaults to 0.5) -- | |
| Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a | |
| starting point and more noise is added the higher the `strength`. The number of denoising steps depends | |
| on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising | |
| process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 | |
| essentially ignores `image`. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 12) -- | |
| The number of denoising steps. More denoising steps usually lead to a higher quality image at the | |
| expense of slower inference. | |
| - **guidance_scale** (`float`, *optional*, defaults to 10.0) -- | |
| A higher guidance scale value encourages the model to generate images closely linked to the text | |
| `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide what to not include in image generation. If not defined, you need to | |
| pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`). | |
| - **num_images_per_prompt** (`int`, *optional*, defaults to 1) -- | |
| The number of images to generate per prompt. | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make | |
| generation deterministic. | |
| - **prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not | |
| provided, text embeddings are generated from the `prompt` input argument. A single vector from the | |
| pooled and projected final hidden states. | |
| - **encoder_hidden_states** (`torch.Tensor`, *optional*) -- | |
| Pre-generated penultimate hidden states from the text encoder providing additional text conditioning. | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If | |
| not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. | |
| - **negative_encoder_hidden_states** (`torch.Tensor`, *optional*) -- | |
| Analogous to `encoder_hidden_states` for the positive prompt. | |
| - **output_type** (`str`, *optional*, defaults to `"pil"`) -- | |
| The output format of the generated image. Choose between `PIL.Image` or `np.array`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a [StableDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) instead of a | |
| plain tuple. | |
| - **callback** (`Callable`, *optional*) -- | |
| A function that calls every `callback_steps` steps during inference. The function is called with the | |
| following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`. | |
| - **callback_steps** (`int`, *optional*, defaults to 1) -- | |
| The frequency at which the `callback` function is called. If not specified, the callback is called at | |
| every step. | |
| - **cross_attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in | |
| [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). | |
| - **micro_conditioning_aesthetic_score** (`int`, *optional*, defaults to 6) -- | |
| The targeted aesthetic score according to the laion aesthetic classifier. See | |
| https://laion.ai/blog/laion-aesthetics/ and the micro-conditioning section of | |
| https://huggingface.co/papers/2307.01952. | |
| - **micro_conditioning_crop_coord** (`Tuple[int]`, *optional*, defaults to (0, 0)) -- | |
| The targeted height, width crop coordinates. See the micro-conditioning section of | |
| https://huggingface.co/papers/2307.01952. | |
| - **temperature** (`Union[int, Tuple[int, int], List[int]]`, *optional*, defaults to (2, 0)) -- | |
| Configures the temperature scheduler on `self.scheduler` see `AmusedScheduler#set_timesteps`.</paramsdesc><paramgroups>0</paramgroups><rettype>[ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) or `tuple`</rettype><retdesc>If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) is returned, otherwise a | |
| `tuple` is returned where the first element is a list with the generated images.</retdesc></docstring> | |
| The call function to the pipeline for generation. | |
| <ExampleCodeBlock anchor="diffusers.AmusedImg2ImgPipeline.__call__.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import AmusedImg2ImgPipeline | |
| >>> from diffusers.utils import load_image | |
| >>> pipe = AmusedImg2ImgPipeline.from_pretrained( | |
| ... "amused/amused-512", variant="fp16", torch_dtype=torch.float16 | |
| ... ) | |
| >>> pipe = pipe.to("cuda") | |
| >>> prompt = "winter mountains" | |
| >>> input_image = ( | |
| ... load_image( | |
| ... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains.jpg" | |
| ... ) | |
| ... .resize((512, 512)) | |
| ... .convert("RGB") | |
| ... ) | |
| >>> image = pipe(prompt, input_image).images[0] | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedImg2ImgPipeline.enable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1921</source><parameters>[{"name": "attention_op", "val": ": typing.Optional[typing.Callable] = None"}]</parameters><paramsdesc>- **attention_op** (`Callable`, *optional*) -- | |
| Override the default `None` operator for use as `op` argument to the | |
| [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) | |
| function of xFormers.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Enable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). When this | |
| option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed | |
| up during training is not guaranteed. | |
| > [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient | |
| attention takes > precedent. | |
| <ExampleCodeBlock anchor="diffusers.AmusedImg2ImgPipeline.enable_xformers_memory_efficient_attention.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import DiffusionPipeline | |
| >>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp | |
| >>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp) | |
| >>> # Workaround for not accepting attention shape using VAE for Flash Attention | |
| >>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedImg2ImgPipeline.disable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1952</source><parameters>[]</parameters></docstring> | |
| Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). | |
| </div></div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AmusedInpaintPipeline</name><anchor>diffusers.AmusedInpaintPipeline</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L68</source><parameters>[{"name": "vqvae", "val": ": VQModel"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "text_encoder", "val": ": CLIPTextModelWithProjection"}, {"name": "transformer", "val": ": UVit2DModel"}, {"name": "scheduler", "val": ": AmusedScheduler"}]</parameters></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__call__</name><anchor>diffusers.AmusedInpaintPipeline.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L114</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None"}, {"name": "mask_image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None"}, {"name": "strength", "val": ": float = 1.0"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "num_images_per_prompt", "val": ": typing.Optional[int] = 1"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "temperature", "val": ": typing.Union[int, typing.Tuple[int, int], typing.List[int]] = (2, 0)"}]</parameters><paramsdesc>- **prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`. | |
| - **image** (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) -- | |
| `Image`, numpy array or tensor representing an image batch to be used as the starting point. For both | |
| numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list | |
| or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a | |
| list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image | |
| latents as `image`, but if passing latents directly it is not encoded again. | |
| - **mask_image** (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) -- | |
| `Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask | |
| are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a | |
| single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one | |
| color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B, | |
| H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W, | |
| 1)`, or `(H, W)`. | |
| - **strength** (`float`, *optional*, defaults to 1.0) -- | |
| Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a | |
| starting point and more noise is added the higher the `strength`. The number of denoising steps depends | |
| on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising | |
| process runs for the full number of iterations specified in `num_inference_steps`. A value of 1 | |
| essentially ignores `image`. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 16) -- | |
| The number of denoising steps. More denoising steps usually lead to a higher quality image at the | |
| expense of slower inference. | |
| - **guidance_scale** (`float`, *optional*, defaults to 10.0) -- | |
| A higher guidance scale value encourages the model to generate images closely linked to the text | |
| `prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`. | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide what to not include in image generation. If not defined, you need to | |
| pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`). | |
| - **num_images_per_prompt** (`int`, *optional*, defaults to 1) -- | |
| The number of images to generate per prompt. | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make | |
| generation deterministic. | |
| - **prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not | |
| provided, text embeddings are generated from the `prompt` input argument. A single vector from the | |
| pooled and projected final hidden states. | |
| - **encoder_hidden_states** (`torch.Tensor`, *optional*) -- | |
| Pre-generated penultimate hidden states from the text encoder providing additional text conditioning. | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If | |
| not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. | |
| - **negative_encoder_hidden_states** (`torch.Tensor`, *optional*) -- | |
| Analogous to `encoder_hidden_states` for the positive prompt. | |
| - **output_type** (`str`, *optional*, defaults to `"pil"`) -- | |
| The output format of the generated image. Choose between `PIL.Image` or `np.array`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a [StableDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) instead of a | |
| plain tuple. | |
| - **callback** (`Callable`, *optional*) -- | |
| A function that calls every `callback_steps` steps during inference. The function is called with the | |
| following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`. | |
| - **callback_steps** (`int`, *optional*, defaults to 1) -- | |
| The frequency at which the `callback` function is called. If not specified, the callback is called at | |
| every step. | |
| - **cross_attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in | |
| [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). | |
| - **micro_conditioning_aesthetic_score** (`int`, *optional*, defaults to 6) -- | |
| The targeted aesthetic score according to the laion aesthetic classifier. See | |
| https://laion.ai/blog/laion-aesthetics/ and the micro-conditioning section of | |
| https://huggingface.co/papers/2307.01952. | |
| - **micro_conditioning_crop_coord** (`Tuple[int]`, *optional*, defaults to (0, 0)) -- | |
| The targeted height, width crop coordinates. See the micro-conditioning section of | |
| https://huggingface.co/papers/2307.01952. | |
| - **temperature** (`Union[int, Tuple[int, int], List[int]]`, *optional*, defaults to (2, 0)) -- | |
| Configures the temperature scheduler on `self.scheduler` see `AmusedScheduler#set_timesteps`.</paramsdesc><paramgroups>0</paramgroups><rettype>[ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) or `tuple`</rettype><retdesc>If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) is returned, otherwise a | |
| `tuple` is returned where the first element is a list with the generated images.</retdesc></docstring> | |
| The call function to the pipeline for generation. | |
| <ExampleCodeBlock anchor="diffusers.AmusedInpaintPipeline.__call__.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import AmusedInpaintPipeline | |
| >>> from diffusers.utils import load_image | |
| >>> pipe = AmusedInpaintPipeline.from_pretrained( | |
| ... "amused/amused-512", variant="fp16", torch_dtype=torch.float16 | |
| ... ) | |
| >>> pipe = pipe.to("cuda") | |
| >>> prompt = "fall mountains" | |
| >>> input_image = ( | |
| ... load_image( | |
| ... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1.jpg" | |
| ... ) | |
| ... .resize((512, 512)) | |
| ... .convert("RGB") | |
| ... ) | |
| >>> mask = ( | |
| ... load_image( | |
| ... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1_mask.png" | |
| ... ) | |
| ... .resize((512, 512)) | |
| ... .convert("L") | |
| ... ) | |
| >>> pipe(prompt, input_image, mask).images[0].save("out.png") | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedInpaintPipeline.enable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1921</source><parameters>[{"name": "attention_op", "val": ": typing.Optional[typing.Callable] = None"}]</parameters><paramsdesc>- **attention_op** (`Callable`, *optional*) -- | |
| Override the default `None` operator for use as `op` argument to the | |
| [`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention) | |
| function of xFormers.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Enable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). When this | |
| option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed | |
| up during training is not guaranteed. | |
| > [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient | |
| attention takes > precedent. | |
| <ExampleCodeBlock anchor="diffusers.AmusedInpaintPipeline.enable_xformers_memory_efficient_attention.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import DiffusionPipeline | |
| >>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp | |
| >>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp) | |
| >>> # Workaround for not accepting attention shape using VAE for Flash Attention | |
| >>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedInpaintPipeline.disable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1952</source><parameters>[]</parameters></docstring> | |
| Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/amused.md" /> |
Xet Storage Details
- Size:
- 35.6 kB
- Xet hash:
- 041951ab3d8eacd9c85329bbdb2077e1930121d1c51f1b22b6c6d23b2580879e
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.