Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_12229 /en /api /pipelines /amused.md

rtrm

29 days ago

preview code

download

raw

35.6 kB

	# aMUSEd

	aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen.

	Amused is a lightweight text to image model based off of the [MUSE](https://huggingface.co/papers/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.

	Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.

	The abstract from the paper is:

	We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.

	\| Model \| Params \|
	\|-------\|--------\|
	\| [amused-256](https://huggingface.co/amused/amused-256) \| 603M \|
	\| [amused-512](https://huggingface.co/amused/amused-512) \| 608M \|

	## AmusedPipeline[[diffusers.AmusedPipeline]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.AmusedPipeline</name><anchor>diffusers.AmusedPipeline</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused.py#L50</source><parameters>[{"name": "vqvae", "val": ": VQModel"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "text_encoder", "val": ": CLIPTextModelWithProjection"}, {"name": "transformer", "val": ": UVit2DModel"}, {"name": "scheduler", "val": ": AmusedScheduler"}]</parameters></docstring>



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>__call__</name><anchor>diffusers.AmusedPipeline.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused.py#L83</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "height", "val": ": typing.Optional[int] = None"}, {"name": "width", "val": ": typing.Optional[int] = None"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "num_images_per_prompt", "val": ": typing.Optional[int] = 1"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "latents", "val": ": typing.Optional[torch.IntTensor] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "temperature", "val": ": typing.Union[int, typing.Tuple[int, int], typing.List[int]] = (2, 0)"}]</parameters><paramsdesc>- prompt (`str` or `List[str]`, optional) --
	The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
	- height (`int`, optional, defaults to `self.transformer.config.sample_size * self.vae_scale_factor`) --
	The height in pixels of the generated image.
	- width (`int`, optional, defaults to `self.unet.config.sample_size * self.vae_scale_factor`) --
	The width in pixels of the generated image.
	- num_inference_steps (`int`, optional, defaults to 16) --
	The number of denoising steps. More denoising steps usually lead to a higher quality image at the
	expense of slower inference.
	- guidance_scale (`float`, optional, defaults to 10.0) --
	A higher guidance scale value encourages the model to generate images closely linked to the text
	`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
	- negative_prompt (`str` or `List[str]`, optional) --
	The prompt or prompts to guide what to not include in image generation. If not defined, you need to
	pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
	- num_images_per_prompt (`int`, optional, defaults to 1) --
	The number of images to generate per prompt.
	- generator (`torch.Generator`, optional) --
	A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
	generation deterministic.
	- latents (`torch.IntTensor`, optional) --
	Pre-generated tokens representing latent vectors in `self.vqvae`, to be used as inputs for image
	generation. If not provided, the starting latents will be completely masked.
	- prompt_embeds (`torch.Tensor`, optional) --
	Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
	provided, text embeddings are generated from the `prompt` input argument. A single vector from the
	pooled and projected final hidden states.
	- encoder_hidden_states (`torch.Tensor`, optional) --
	Pre-generated penultimate hidden states from the text encoder providing additional text conditioning.
	- negative_prompt_embeds (`torch.Tensor`, optional) --
	Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
	not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
	- negative_encoder_hidden_states (`torch.Tensor`, optional) --
	Analogous to `encoder_hidden_states` for the positive prompt.
	- output_type (`str`, optional, defaults to `"pil"`) --
	The output format of the generated image. Choose between `PIL.Image` or `np.array`.
	- return_dict (`bool`, optional, defaults to `True`) --
	Whether or not to return a [StableDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) instead of a
	plain tuple.
	- callback (`Callable`, optional) --
	A function that calls every `callback_steps` steps during inference. The function is called with the
	following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
	- callback_steps (`int`, optional, defaults to 1) --
	The frequency at which the `callback` function is called. If not specified, the callback is called at
	every step.
	- cross_attention_kwargs (`dict`, optional) --
	A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in
	[`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
	- micro_conditioning_aesthetic_score (`int`, optional, defaults to 6) --
	The targeted aesthetic score according to the laion aesthetic classifier. See
	https://laion.ai/blog/laion-aesthetics/ and the micro-conditioning section of
	https://huggingface.co/papers/2307.01952.
	- micro_conditioning_crop_coord (`Tuple[int]`, optional, defaults to (0, 0)) --
	The targeted height, width crop coordinates. See the micro-conditioning section of
	https://huggingface.co/papers/2307.01952.
	- temperature (`Union[int, Tuple[int, int], List[int]]`, optional, defaults to (2, 0)) --
	Configures the temperature scheduler on `self.scheduler` see `AmusedScheduler#set_timesteps`.</paramsdesc><paramgroups>0</paramgroups><rettype>[ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) or `tuple`</rettype><retdesc>If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) is returned, otherwise a
	`tuple` is returned where the first element is a list with the generated images.</retdesc></docstring>

	The call function to the pipeline for generation.



	<ExampleCodeBlock anchor="diffusers.AmusedPipeline.__call__.example">

	Examples:
	```py
	>>> import torch
	>>> from diffusers import AmusedPipeline

	>>> pipe = AmusedPipeline.from_pretrained("amused/amused-512", variant="fp16", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")

	>>> prompt = "a photo of an astronaut riding a horse on mars"
	>>> image = pipe(prompt).images[0]
	```

	</ExampleCodeBlock>







	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>enable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedPipeline.enable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1921</source><parameters>[{"name": "attention_op", "val": ": typing.Optional[typing.Callable] = None"}]</parameters><paramsdesc>- attention_op (`Callable`, optional) --
	Override the default `None` operator for use as `op` argument to the
	[`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention)
	function of xFormers.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Enable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). When this
	option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed
	up during training is not guaranteed.

	> [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient
	attention takes > precedent.



	<ExampleCodeBlock anchor="diffusers.AmusedPipeline.enable_xformers_memory_efficient_attention.example">

	Examples:

	```py
	>>> import torch
	>>> from diffusers import DiffusionPipeline
	>>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

	>>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")
	>>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
	>>> # Workaround for not accepting attention shape using VAE for Flash Attention
	>>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)
	```

	</ExampleCodeBlock>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>disable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedPipeline.disable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1952</source><parameters>[]</parameters></docstring>

	Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).


	</div></div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.AmusedImg2ImgPipeline</name><anchor>diffusers.AmusedImg2ImgPipeline</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L60</source><parameters>[{"name": "vqvae", "val": ": VQModel"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "text_encoder", "val": ": CLIPTextModelWithProjection"}, {"name": "transformer", "val": ": UVit2DModel"}, {"name": "scheduler", "val": ": AmusedScheduler"}]</parameters></docstring>



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>__call__</name><anchor>diffusers.AmusedImg2ImgPipeline.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_img2img.py#L98</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None"}, {"name": "strength", "val": ": float = 0.5"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "num_images_per_prompt", "val": ": typing.Optional[int] = 1"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "temperature", "val": ": typing.Union[int, typing.Tuple[int, int], typing.List[int]] = (2, 0)"}]</parameters><paramsdesc>- prompt (`str` or `List[str]`, optional) --
	The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
	- image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) --
	`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
	numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list
	or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
	list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
	latents as `image`, but if passing latents directly it is not encoded again.
	- strength (`float`, optional, defaults to 0.5) --
	Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a
	starting point and more noise is added the higher the `strength`. The number of denoising steps depends
	on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising
	process runs for the full number of iterations specified in `num_inference_steps`. A value of 1
	essentially ignores `image`.
	- num_inference_steps (`int`, optional, defaults to 12) --
	The number of denoising steps. More denoising steps usually lead to a higher quality image at the
	expense of slower inference.
	- guidance_scale (`float`, optional, defaults to 10.0) --
	A higher guidance scale value encourages the model to generate images closely linked to the text
	`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
	- negative_prompt (`str` or `List[str]`, optional) --
	The prompt or prompts to guide what to not include in image generation. If not defined, you need to
	pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
	- num_images_per_prompt (`int`, optional, defaults to 1) --
	The number of images to generate per prompt.
	- generator (`torch.Generator`, optional) --
	A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
	generation deterministic.
	- prompt_embeds (`torch.Tensor`, optional) --
	Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
	provided, text embeddings are generated from the `prompt` input argument. A single vector from the
	pooled and projected final hidden states.
	- encoder_hidden_states (`torch.Tensor`, optional) --
	Pre-generated penultimate hidden states from the text encoder providing additional text conditioning.
	- negative_prompt_embeds (`torch.Tensor`, optional) --
	Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
	not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
	- negative_encoder_hidden_states (`torch.Tensor`, optional) --
	Analogous to `encoder_hidden_states` for the positive prompt.
	- output_type (`str`, optional, defaults to `"pil"`) --
	The output format of the generated image. Choose between `PIL.Image` or `np.array`.
	- return_dict (`bool`, optional, defaults to `True`) --
	Whether or not to return a [StableDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) instead of a
	plain tuple.
	- callback (`Callable`, optional) --
	A function that calls every `callback_steps` steps during inference. The function is called with the
	following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
	- callback_steps (`int`, optional, defaults to 1) --
	The frequency at which the `callback` function is called. If not specified, the callback is called at
	every step.
	- cross_attention_kwargs (`dict`, optional) --
	A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in
	[`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
	- micro_conditioning_aesthetic_score (`int`, optional, defaults to 6) --
	The targeted aesthetic score according to the laion aesthetic classifier. See
	https://laion.ai/blog/laion-aesthetics/ and the micro-conditioning section of
	https://huggingface.co/papers/2307.01952.
	- micro_conditioning_crop_coord (`Tuple[int]`, optional, defaults to (0, 0)) --
	The targeted height, width crop coordinates. See the micro-conditioning section of
	https://huggingface.co/papers/2307.01952.
	- temperature (`Union[int, Tuple[int, int], List[int]]`, optional, defaults to (2, 0)) --
	Configures the temperature scheduler on `self.scheduler` see `AmusedScheduler#set_timesteps`.</paramsdesc><paramgroups>0</paramgroups><rettype>[ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) or `tuple`</rettype><retdesc>If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) is returned, otherwise a
	`tuple` is returned where the first element is a list with the generated images.</retdesc></docstring>

	The call function to the pipeline for generation.



	<ExampleCodeBlock anchor="diffusers.AmusedImg2ImgPipeline.__call__.example">

	Examples:
	```py
	>>> import torch
	>>> from diffusers import AmusedImg2ImgPipeline
	>>> from diffusers.utils import load_image

	>>> pipe = AmusedImg2ImgPipeline.from_pretrained(
	... "amused/amused-512", variant="fp16", torch_dtype=torch.float16
	... )
	>>> pipe = pipe.to("cuda")

	>>> prompt = "winter mountains"
	>>> input_image = (
	... load_image(
	... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains.jpg"
	... )
	... .resize((512, 512))
	... .convert("RGB")
	... )
	>>> image = pipe(prompt, input_image).images[0]
	```

	</ExampleCodeBlock>







	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>enable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedImg2ImgPipeline.enable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1921</source><parameters>[{"name": "attention_op", "val": ": typing.Optional[typing.Callable] = None"}]</parameters><paramsdesc>- attention_op (`Callable`, optional) --
	Override the default `None` operator for use as `op` argument to the
	[`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention)
	function of xFormers.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Enable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). When this
	option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed
	up during training is not guaranteed.

	> [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient
	attention takes > precedent.



	<ExampleCodeBlock anchor="diffusers.AmusedImg2ImgPipeline.enable_xformers_memory_efficient_attention.example">

	Examples:

	```py
	>>> import torch
	>>> from diffusers import DiffusionPipeline
	>>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

	>>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")
	>>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
	>>> # Workaround for not accepting attention shape using VAE for Flash Attention
	>>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)
	```

	</ExampleCodeBlock>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>disable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedImg2ImgPipeline.disable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1952</source><parameters>[]</parameters></docstring>

	Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).


	</div></div>

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class diffusers.AmusedInpaintPipeline</name><anchor>diffusers.AmusedInpaintPipeline</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L68</source><parameters>[{"name": "vqvae", "val": ": VQModel"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "text_encoder", "val": ": CLIPTextModelWithProjection"}, {"name": "transformer", "val": ": UVit2DModel"}, {"name": "scheduler", "val": ": AmusedScheduler"}]</parameters></docstring>



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>__call__</name><anchor>diffusers.AmusedInpaintPipeline.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/amused/pipeline_amused_inpaint.py#L114</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None"}, {"name": "mask_image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None"}, {"name": "strength", "val": ": float = 1.0"}, {"name": "num_inference_steps", "val": ": int = 12"}, {"name": "guidance_scale", "val": ": float = 10.0"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "num_images_per_prompt", "val": ": typing.Optional[int] = 1"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_encoder_hidden_states", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "output_type", "val": " = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback", "val": ": typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None"}, {"name": "callback_steps", "val": ": int = 1"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "micro_conditioning_aesthetic_score", "val": ": int = 6"}, {"name": "micro_conditioning_crop_coord", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "temperature", "val": ": typing.Union[int, typing.Tuple[int, int], typing.List[int]] = (2, 0)"}]</parameters><paramsdesc>- prompt (`str` or `List[str]`, optional) --
	The prompt or prompts to guide image generation. If not defined, you need to pass `prompt_embeds`.
	- image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) --
	`Image`, numpy array or tensor representing an image batch to be used as the starting point. For both
	numpy array and pytorch tensor, the expected value range is between `[0, 1]` If it's a tensor or a list
	or tensors, the expected shape should be `(B, C, H, W)` or `(C, H, W)`. If it is a numpy array or a
	list of arrays, the expected shape should be `(B, H, W, C)` or `(H, W, C)` It can also accept image
	latents as `image`, but if passing latents directly it is not encoded again.
	- mask_image (`torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, `List[torch.Tensor]`, `List[PIL.Image.Image]`, or `List[np.ndarray]`) --
	`Image`, numpy array or tensor representing an image batch to mask `image`. White pixels in the mask
	are repainted while black pixels are preserved. If `mask_image` is a PIL image, it is converted to a
	single channel (luminance) before use. If it's a numpy array or pytorch tensor, it should contain one
	color channel (L) instead of 3, so the expected shape for pytorch tensor would be `(B, 1, H, W)`, `(B,
	H, W)`, `(1, H, W)`, `(H, W)`. And for numpy array would be for `(B, H, W, 1)`, `(B, H, W)`, `(H, W,
	1)`, or `(H, W)`.
	- strength (`float`, optional, defaults to 1.0) --
	Indicates extent to transform the reference `image`. Must be between 0 and 1. `image` is used as a
	starting point and more noise is added the higher the `strength`. The number of denoising steps depends
	on the amount of noise initially added. When `strength` is 1, added noise is maximum and the denoising
	process runs for the full number of iterations specified in `num_inference_steps`. A value of 1
	essentially ignores `image`.
	- num_inference_steps (`int`, optional, defaults to 16) --
	The number of denoising steps. More denoising steps usually lead to a higher quality image at the
	expense of slower inference.
	- guidance_scale (`float`, optional, defaults to 10.0) --
	A higher guidance scale value encourages the model to generate images closely linked to the text
	`prompt` at the expense of lower image quality. Guidance scale is enabled when `guidance_scale > 1`.
	- negative_prompt (`str` or `List[str]`, optional) --
	The prompt or prompts to guide what to not include in image generation. If not defined, you need to
	pass `negative_prompt_embeds` instead. Ignored when not using guidance (`guidance_scale < 1`).
	- num_images_per_prompt (`int`, optional, defaults to 1) --
	The number of images to generate per prompt.
	- generator (`torch.Generator`, optional) --
	A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make
	generation deterministic.
	- prompt_embeds (`torch.Tensor`, optional) --
	Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not
	provided, text embeddings are generated from the `prompt` input argument. A single vector from the
	pooled and projected final hidden states.
	- encoder_hidden_states (`torch.Tensor`, optional) --
	Pre-generated penultimate hidden states from the text encoder providing additional text conditioning.
	- negative_prompt_embeds (`torch.Tensor`, optional) --
	Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If
	not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument.
	- negative_encoder_hidden_states (`torch.Tensor`, optional) --
	Analogous to `encoder_hidden_states` for the positive prompt.
	- output_type (`str`, optional, defaults to `"pil"`) --
	The output format of the generated image. Choose between `PIL.Image` or `np.array`.
	- return_dict (`bool`, optional, defaults to `True`) --
	Whether or not to return a [StableDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput) instead of a
	plain tuple.
	- callback (`Callable`, optional) --
	A function that calls every `callback_steps` steps during inference. The function is called with the
	following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`.
	- callback_steps (`int`, optional, defaults to 1) --
	The frequency at which the `callback` function is called. If not specified, the callback is called at
	every step.
	- cross_attention_kwargs (`dict`, optional) --
	A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in
	[`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
	- micro_conditioning_aesthetic_score (`int`, optional, defaults to 6) --
	The targeted aesthetic score according to the laion aesthetic classifier. See
	https://laion.ai/blog/laion-aesthetics/ and the micro-conditioning section of
	https://huggingface.co/papers/2307.01952.
	- micro_conditioning_crop_coord (`Tuple[int]`, optional, defaults to (0, 0)) --
	The targeted height, width crop coordinates. See the micro-conditioning section of
	https://huggingface.co/papers/2307.01952.
	- temperature (`Union[int, Tuple[int, int], List[int]]`, optional, defaults to (2, 0)) --
	Configures the temperature scheduler on `self.scheduler` see `AmusedScheduler#set_timesteps`.</paramsdesc><paramgroups>0</paramgroups><rettype>[ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) or `tuple`</rettype><retdesc>If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/dit#diffusers.ImagePipelineOutput) is returned, otherwise a
	`tuple` is returned where the first element is a list with the generated images.</retdesc></docstring>

	The call function to the pipeline for generation.



	<ExampleCodeBlock anchor="diffusers.AmusedInpaintPipeline.__call__.example">

	Examples:
	```py
	>>> import torch
	>>> from diffusers import AmusedInpaintPipeline
	>>> from diffusers.utils import load_image

	>>> pipe = AmusedInpaintPipeline.from_pretrained(
	... "amused/amused-512", variant="fp16", torch_dtype=torch.float16
	... )
	>>> pipe = pipe.to("cuda")

	>>> prompt = "fall mountains"
	>>> input_image = (
	... load_image(
	... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1.jpg"
	... )
	... .resize((512, 512))
	... .convert("RGB")
	... )
	>>> mask = (
	... load_image(
	... "https://huggingface.co/datasets/diffusers/docs-images/resolve/main/open_muse/mountains_1_mask.png"
	... )
	... .resize((512, 512))
	... .convert("L")
	... )
	>>> pipe(prompt, input_image, mask).images[0].save("out.png")
	```

	</ExampleCodeBlock>







	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>enable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedInpaintPipeline.enable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1921</source><parameters>[{"name": "attention_op", "val": ": typing.Optional[typing.Callable] = None"}]</parameters><paramsdesc>- attention_op (`Callable`, optional) --
	Override the default `None` operator for use as `op` argument to the
	[`memory_efficient_attention()`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention)
	function of xFormers.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Enable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/). When this
	option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed
	up during training is not guaranteed.

	> [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient
	attention takes > precedent.



	<ExampleCodeBlock anchor="diffusers.AmusedInpaintPipeline.enable_xformers_memory_efficient_attention.example">

	Examples:

	```py
	>>> import torch
	>>> from diffusers import DiffusionPipeline
	>>> from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

	>>> pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
	>>> pipe = pipe.to("cuda")
	>>> pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
	>>> # Workaround for not accepting attention shape using VAE for Flash Attention
	>>> pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)
	```

	</ExampleCodeBlock>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>disable_xformers_memory_efficient_attention</name><anchor>diffusers.AmusedInpaintPipeline.disable_xformers_memory_efficient_attention</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/pipeline_utils.py#L1952</source><parameters>[]</parameters></docstring>

	Disable memory efficient attention from [xFormers](https://facebookresearch.github.io/xformers/).


	</div></div>

	<EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/amused.md" />

Xet Storage Details

Size:: 35.6 kB
Xet hash:: 041951ab3d8eacd9c85329bbdb2077e1930121d1c51f1b22b6c6d23b2580879e

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.