Buckets:
| # LEDITS++ | |
| <div class="flex flex-wrap space-x-1"> | |
| <img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/> | |
| </div> | |
| LEDITS++ was proposed in [LEDITS++: Limitless Image Editing using Text-to-Image Models](https://huggingface.co/papers/2311.16711) by Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, Apolinário Passos. | |
| The abstract from the paper is: | |
| *Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to real image editing. However, existing image-to-image methods are often inefficient, imprecise, and of limited versatility. They either require time-consuming fine-tuning, deviate unnecessarily strongly from the input image, and/or lack support for multiple, simultaneous edits. To address these issues, we introduce LEDITS++, an efficient yet versatile and precise textual image manipulation technique. LEDITS++'s novel inversion approach requires no tuning nor optimization and produces high-fidelity results with a few diffusion steps. Second, our methodology supports multiple simultaneous edits and is architecture-agnostic. Third, we use a novel implicit masking technique that limits changes to relevant image regions. We propose the novel TEdBench++ benchmark as part of our exhaustive evaluation. Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods. The project page is available at https://leditsplusplus-project.static.hf.space .* | |
| > [!TIP] | |
| > You can find additional information about LEDITS++ on the [project page](https://leditsplusplus-project.static.hf.space/index.html) and try it out in a [demo](https://huggingface.co/spaces/editing-images/leditsplusplus). | |
| > [!WARNING] | |
| > Due to some backward compatibility issues with the current diffusers implementation of [DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) this implementation of LEdits++ can no longer guarantee perfect inversion. | |
| > This issue is unlikely to have any noticeable effects on applied use-cases. However, we provide an alternative implementation that guarantees perfect inversion in a dedicated [GitHub repo](https://github.com/ml-research/ledits_pp). | |
| We provide two distinct pipelines based on different pre-trained models. | |
| ## LEditsPPPipelineStableDiffusion[[diffusers.LEditsPPPipelineStableDiffusion]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.LEditsPPPipelineStableDiffusion</name><anchor>diffusers.LEditsPPPipelineStableDiffusion</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L269</source><parameters>[{"name": "vae", "val": ": AutoencoderKL"}, {"name": "text_encoder", "val": ": CLIPTextModel"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "unet", "val": ": UNet2DConditionModel"}, {"name": "scheduler", "val": ": typing.Union[diffusers.schedulers.scheduling_ddim.DDIMScheduler, diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler]"}, {"name": "safety_checker", "val": ": StableDiffusionSafetyChecker"}, {"name": "feature_extractor", "val": ": CLIPImageProcessor"}, {"name": "requires_safety_checker", "val": ": bool = True"}]</parameters><paramsdesc>- **vae** ([AutoencoderKL](/docs/diffusers/pr_12229/en/api/models/autoencoderkl#diffusers.AutoencoderKL)) -- | |
| Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations. | |
| - **text_encoder** ([CLIPTextModel](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTextModel)) -- | |
| Frozen text-encoder. Stable Diffusion uses the text portion of | |
| [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel), specifically | |
| the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant. | |
| - **tokenizer** ([CLIPTokenizer](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTokenizer)) -- | |
| Tokenizer of class | |
| [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer). | |
| - **unet** ([UNet2DConditionModel](/docs/diffusers/pr_12229/en/api/models/unet2d-cond#diffusers.UNet2DConditionModel)) -- Conditional U-Net architecture to denoise the encoded image latents. | |
| - **scheduler** ([DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) or [DDIMScheduler](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.DDIMScheduler)) -- | |
| A scheduler to be used in combination with `unet` to denoise the encoded image latens. Can be one of | |
| [DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) or [DDIMScheduler](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.DDIMScheduler). If any other scheduler is passed it will | |
| automatically be set to [DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler). | |
| - **safety_checker** (`StableDiffusionSafetyChecker`) -- | |
| Classification module that estimates whether generated images could be considered offensive or harmful. | |
| Please, refer to the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-4) for details. | |
| - **feature_extractor** ([CLIPImageProcessor](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPImageProcessor)) -- | |
| Model that extracts features from generated images to be used as inputs for the `safety_checker`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Pipeline for textual image editing using LEDits++ with Stable Diffusion. | |
| This model inherits from [DiffusionPipeline](/docs/diffusers/pr_12229/en/api/pipelines/overview#diffusers.DiffusionPipeline) and builds on the [StableDiffusionPipeline](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline). Check the superclass | |
| documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular | |
| device, etc.). | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__call__</name><anchor>diffusers.LEditsPPPipelineStableDiffusion.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L773</source><parameters>[{"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "generator", "val": ": typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None"}, {"name": "output_type", "val": ": typing.Optional[str] = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "editing_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "editing_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "reverse_editing_direction", "val": ": typing.Union[bool, typing.List[bool], NoneType] = False"}, {"name": "edit_guidance_scale", "val": ": typing.Union[float, typing.List[float], NoneType] = 5"}, {"name": "edit_warmup_steps", "val": ": typing.Union[int, typing.List[int], NoneType] = 0"}, {"name": "edit_cooldown_steps", "val": ": typing.Union[int, typing.List[int], NoneType] = None"}, {"name": "edit_threshold", "val": ": typing.Union[float, typing.List[float], NoneType] = 0.9"}, {"name": "user_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "sem_guidance", "val": ": typing.Optional[typing.List[torch.Tensor]] = None"}, {"name": "use_cross_attn_mask", "val": ": bool = False"}, {"name": "use_intersect_mask", "val": ": bool = True"}, {"name": "attn_store_steps", "val": ": typing.Optional[typing.List[int]] = []"}, {"name": "store_averaged_over_steps", "val": ": bool = True"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "guidance_rescale", "val": ": float = 0.0"}, {"name": "clip_skip", "val": ": typing.Optional[int] = None"}, {"name": "callback_on_step_end", "val": ": typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None"}, {"name": "callback_on_step_end_tensor_inputs", "val": ": typing.List[str] = ['latents']"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored | |
| if `guidance_scale` is less than `1`). | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) | |
| to make generation deterministic. | |
| - **output_type** (`str`, *optional*, defaults to `"pil"`) -- | |
| The output format of the generate image. Choose between | |
| [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a [LEditsPPDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.pipelines.LEditsPPDiffusionPipelineOutput) instead of a plain | |
| tuple. | |
| - **editing_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide the image generation. The image is reconstructed by setting | |
| `editing_prompt = None`. Guidance direction of prompt should be specified via | |
| `reverse_editing_direction`. | |
| - **editing_prompt_embeds** (`torch.Tensor>`, *optional*) -- | |
| Pre-computed embeddings to use for guiding the image generation. Guidance direction of embedding should | |
| be specified via `reverse_editing_direction`. | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If | |
| not provided, `negative_prompt_embeds` are generated from the `negative_prompt` input argument. | |
| - **reverse_editing_direction** (`bool` or `List[bool]`, *optional*, defaults to `False`) -- | |
| Whether the corresponding prompt in `editing_prompt` should be increased or decreased. | |
| - **edit_guidance_scale** (`float` or `List[float]`, *optional*, defaults to 5) -- | |
| Guidance scale for guiding the image generation. If provided as list values should correspond to | |
| `editing_prompt`. `edit_guidance_scale` is defined as `s_e` of equation 12 of [LEDITS++ | |
| Paper](https://huggingface.co/papers/2301.12247). | |
| - **edit_warmup_steps** (`float` or `List[float]`, *optional*, defaults to 10) -- | |
| Number of diffusion steps (for each prompt) for which guidance will not be applied. | |
| - **edit_cooldown_steps** (`float` or `List[float]`, *optional*, defaults to `None`) -- | |
| Number of diffusion steps (for each prompt) after which guidance will no longer be applied. | |
| - **edit_threshold** (`float` or `List[float]`, *optional*, defaults to 0.9) -- | |
| Masking threshold of guidance. Threshold should be proportional to the image region that is modified. | |
| 'edit_threshold' is defined as 'λ' of equation 12 of [LEDITS++ | |
| Paper](https://huggingface.co/papers/2301.12247). | |
| - **user_mask** (`torch.Tensor`, *optional*) -- | |
| User-provided mask for even better control over the editing process. This is helpful when LEDITS++'s | |
| implicit masks do not meet user preferences. | |
| - **sem_guidance** (`List[torch.Tensor]`, *optional*) -- | |
| List of pre-generated guidance vectors to be applied at generation. Length of the list has to | |
| correspond to `num_inference_steps`. | |
| - **use_cross_attn_mask** (`bool`, defaults to `False`) -- | |
| Whether cross-attention masks are used. Cross-attention masks are always used when use_intersect_mask | |
| is set to true. Cross-attention masks are defined as 'M^1' of equation 12 of [LEDITS++ | |
| paper](https://huggingface.co/papers/2311.16711). | |
| - **use_intersect_mask** (`bool`, defaults to `True`) -- | |
| Whether the masking term is calculated as intersection of cross-attention masks and masks derived from | |
| the noise estimate. Cross-attention mask are defined as 'M^1' and masks derived from the noise estimate | |
| are defined as 'M^2' of equation 12 of [LEDITS++ paper](https://huggingface.co/papers/2311.16711). | |
| - **attn_store_steps** (`List[int]`, *optional*) -- | |
| Steps for which the attention maps are stored in the AttentionStore. Just for visualization purposes. | |
| - **store_averaged_over_steps** (`bool`, defaults to `True`) -- | |
| Whether the attention maps for the 'attn_store_steps' are stored averaged over the diffusion steps. If | |
| False, attention maps for each step are stores separately. Just for visualization purposes. | |
| - **cross_attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in | |
| [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). | |
| - **guidance_rescale** (`float`, *optional*, defaults to 0.0) -- | |
| Guidance rescale factor from [Common Diffusion Noise Schedules and Sample Steps are | |
| Flawed](https://huggingface.co/papers/2305.08891). Guidance rescale factor should fix overexposure when | |
| using zero terminal SNR. | |
| - **clip_skip** (`int`, *optional*) -- | |
| Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that | |
| the output of the pre-final layer will be used for computing the prompt embeddings. | |
| - **callback_on_step_end** (`Callable`, *optional*) -- | |
| A function that calls at the end of each denoising steps during the inference. The function is called | |
| with the following arguments: `callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, | |
| callback_kwargs: Dict)`. `callback_kwargs` will include a list of all tensors as specified by | |
| `callback_on_step_end_tensor_inputs`. | |
| - **callback_on_step_end_tensor_inputs** (`List`, *optional*) -- | |
| The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list | |
| will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the | |
| `._callback_tensor_inputs` attribute of your pipeline class.</paramsdesc><paramgroups>0</paramgroups><rettype>[LEditsPPDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.pipelines.LEditsPPDiffusionPipelineOutput) or `tuple`</rettype><retdesc>[LEditsPPDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.pipelines.LEditsPPDiffusionPipelineOutput) if `return_dict` is True, otherwise a `tuple. When | |
| returning a tuple, the first element is a list with the generated images, and the second element is a list | |
| of `bool`s denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) | |
| content, according to the `safety_checker`.</retdesc></docstring> | |
| The call function to the pipeline for editing. The | |
| [invert()](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.LEditsPPPipelineStableDiffusion.invert) method has to be called beforehand. Edits will | |
| always be performed for the last inverted image(s). | |
| <ExampleCodeBlock anchor="diffusers.LEditsPPPipelineStableDiffusion.__call__.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import LEditsPPPipelineStableDiffusion | |
| >>> from diffusers.utils import load_image | |
| >>> pipe = LEditsPPPipelineStableDiffusion.from_pretrained( | |
| ... "runwayml/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16 | |
| ... ) | |
| >>> pipe.enable_vae_tiling() | |
| >>> pipe = pipe.to("cuda") | |
| >>> img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/cherry_blossom.png" | |
| >>> image = load_image(img_url).resize((512, 512)) | |
| >>> _ = pipe.invert(image=image, num_inversion_steps=50, skip=0.1) | |
| >>> edited_image = pipe( | |
| ... editing_prompt=["cherry blossom"], edit_guidance_scale=10.0, edit_threshold=0.75 | |
| ... ).images[0] | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>invert</name><anchor>diffusers.LEditsPPPipelineStableDiffusion.invert</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L1277</source><parameters>[{"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]]"}, {"name": "source_prompt", "val": ": str = ''"}, {"name": "source_guidance_scale", "val": ": float = 3.5"}, {"name": "num_inversion_steps", "val": ": int = 30"}, {"name": "skip", "val": ": float = 0.15"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "clip_skip", "val": ": typing.Optional[int] = None"}, {"name": "height", "val": ": typing.Optional[int] = None"}, {"name": "width", "val": ": typing.Optional[int] = None"}, {"name": "resize_mode", "val": ": typing.Optional[str] = 'default'"}, {"name": "crops_coords", "val": ": typing.Optional[typing.Tuple[int, int, int, int]] = None"}]</parameters><paramsdesc>- **image** (`PipelineImageInput`) -- | |
| Input for the image(s) that are to be edited. Multiple input images have to default to the same aspect | |
| ratio. | |
| - **source_prompt** (`str`, defaults to `""`) -- | |
| Prompt describing the input image that will be used for guidance during inversion. Guidance is disabled | |
| if the `source_prompt` is `""`. | |
| - **source_guidance_scale** (`float`, defaults to `3.5`) -- | |
| Strength of guidance during inversion. | |
| - **num_inversion_steps** (`int`, defaults to `30`) -- | |
| Number of total performed inversion steps after discarding the initial `skip` steps. | |
| - **skip** (`float`, defaults to `0.15`) -- | |
| Portion of initial steps that will be ignored for inversion and subsequent generation. Lower values | |
| will lead to stronger changes to the input image. `skip` has to be between `0` and `1`. | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make inversion | |
| deterministic. | |
| - **cross_attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined in | |
| [`self.processor`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). | |
| - **clip_skip** (`int`, *optional*) -- | |
| Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that | |
| the output of the pre-final layer will be used for computing the prompt embeddings. | |
| - **height** (`int`, *optional*, defaults to `None`) -- | |
| The height in preprocessed image. If `None`, will use the `get_default_height_width()` to get default | |
| height. | |
| - **width** (`int`, *optional*`, defaults to `None`) -- | |
| The width in preprocessed. If `None`, will use get_default_height_width()` to get the default width. | |
| - **resize_mode** (`str`, *optional*, defaults to `default`) -- | |
| The resize mode, can be one of `default` or `fill`. If `default`, will resize the image to fit within | |
| the specified width and height, and it may not maintaining the original aspect ratio. If `fill`, will | |
| resize the image to fit within the specified width and height, maintaining the aspect ratio, and then | |
| center the image within the dimensions, filling empty with data from image. If `crop`, will resize the | |
| image to fit within the specified width and height, maintaining the aspect ratio, and then center the | |
| image within the dimensions, cropping the excess. Note that resize_mode `fill` and `crop` are only | |
| supported for PIL image input. | |
| - **crops_coords** (`List[Tuple[int, int, int, int]]`, *optional*, defaults to `None`) -- | |
| The crop coordinates for each image in the batch. If `None`, will not crop the image.</paramsdesc><paramgroups>0</paramgroups><rettype>[LEditsPPInversionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.pipelines.LEditsPPInversionPipelineOutput)</rettype><retdesc>Output will contain the resized input image(s) | |
| and respective VAE reconstruction(s).</retdesc></docstring> | |
| The function to the pipeline for image inversion as described by the [LEDITS++ | |
| Paper](https://huggingface.co/papers/2301.12247). If the scheduler is set to [DDIMScheduler](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.DDIMScheduler) the | |
| inversion proposed by [edit-friendly DPDM](https://huggingface.co/papers/2304.06140) will be performed instead. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_vae_slicing</name><anchor>diffusers.LEditsPPPipelineStableDiffusion.disable_vae_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L733</source><parameters>[]</parameters></docstring> | |
| Disable sliced VAE decoding. If `enable_vae_slicing` was previously enabled, this method will go back to | |
| computing decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_vae_tiling</name><anchor>diffusers.LEditsPPPipelineStableDiffusion.disable_vae_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L760</source><parameters>[]</parameters></docstring> | |
| Disable tiled VAE decoding. If `enable_vae_tiling` was previously enabled, this method will go back to | |
| computing decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_vae_slicing</name><anchor>diffusers.LEditsPPPipelineStableDiffusion.enable_vae_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L720</source><parameters>[]</parameters></docstring> | |
| Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | |
| compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_vae_tiling</name><anchor>diffusers.LEditsPPPipelineStableDiffusion.enable_vae_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L746</source><parameters>[]</parameters></docstring> | |
| Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | |
| compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | |
| processing larger images. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>encode_prompt</name><anchor>diffusers.LEditsPPPipelineStableDiffusion.encode_prompt</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion.py#L521</source><parameters>[{"name": "device", "val": ""}, {"name": "num_images_per_prompt", "val": ""}, {"name": "enable_edit_guidance", "val": ""}, {"name": "negative_prompt", "val": " = None"}, {"name": "editing_prompt", "val": " = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "editing_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "lora_scale", "val": ": typing.Optional[float] = None"}, {"name": "clip_skip", "val": ": typing.Optional[int] = None"}]</parameters><paramsdesc>- **device** -- (`torch.device`): | |
| torch device | |
| - **num_images_per_prompt** (`int`) -- | |
| number of images that should be generated per prompt | |
| - **enable_edit_guidance** (`bool`) -- | |
| whether to perform any editing or reconstruct the input image instead | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation. If not defined, one has to pass | |
| `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is | |
| less than `1`). | |
| - **editing_prompt** (`str` or `List[str]`, *optional*) -- | |
| Editing prompt(s) to be encoded. If not defined, one has to pass `editing_prompt_embeds` instead. | |
| - **editing_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not | |
| provided, text embeddings will be generated from `prompt` input argument. | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input | |
| argument. | |
| - **lora_scale** (`float`, *optional*) -- | |
| A LoRA scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded. | |
| - **clip_skip** (`int`, *optional*) -- | |
| Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that | |
| the output of the pre-final layer will be used for computing the prompt embeddings.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Encodes the prompt into text encoder hidden states. | |
| </div></div> | |
| ## LEditsPPPipelineStableDiffusionXL[[diffusers.LEditsPPPipelineStableDiffusionXL]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.LEditsPPPipelineStableDiffusionXL</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L275</source><parameters>[{"name": "vae", "val": ": AutoencoderKL"}, {"name": "text_encoder", "val": ": CLIPTextModel"}, {"name": "text_encoder_2", "val": ": CLIPTextModelWithProjection"}, {"name": "tokenizer", "val": ": CLIPTokenizer"}, {"name": "tokenizer_2", "val": ": CLIPTokenizer"}, {"name": "unet", "val": ": UNet2DConditionModel"}, {"name": "scheduler", "val": ": typing.Union[diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler, diffusers.schedulers.scheduling_ddim.DDIMScheduler]"}, {"name": "image_encoder", "val": ": CLIPVisionModelWithProjection = None"}, {"name": "feature_extractor", "val": ": CLIPImageProcessor = None"}, {"name": "force_zeros_for_empty_prompt", "val": ": bool = True"}, {"name": "add_watermarker", "val": ": typing.Optional[bool] = None"}]</parameters><paramsdesc>- **vae** ([AutoencoderKL](/docs/diffusers/pr_12229/en/api/models/autoencoderkl#diffusers.AutoencoderKL)) -- | |
| Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations. | |
| - **text_encoder** ([CLIPTextModel](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTextModel)) -- | |
| Frozen text-encoder. Stable Diffusion XL uses the text portion of | |
| [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel), specifically | |
| the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant. | |
| - **text_encoder_2** ([CLIPTextModelWithProjection](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTextModelWithProjection)) -- | |
| Second frozen text-encoder. Stable Diffusion XL uses the text and pool portion of | |
| [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModelWithProjection), | |
| specifically the | |
| [laion/CLIP-ViT-bigG-14-laion2B-39B-b160k](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k) | |
| variant. | |
| - **tokenizer** ([CLIPTokenizer](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTokenizer)) -- | |
| Tokenizer of class | |
| [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer). | |
| - **tokenizer_2** ([CLIPTokenizer](https://huggingface.co/docs/transformers/main/en/model_doc/clip#transformers.CLIPTokenizer)) -- | |
| Second Tokenizer of class | |
| [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer). | |
| - **unet** ([UNet2DConditionModel](/docs/diffusers/pr_12229/en/api/models/unet2d-cond#diffusers.UNet2DConditionModel)) -- Conditional U-Net architecture to denoise the encoded image latents. | |
| - **scheduler** ([DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) or [DDIMScheduler](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.DDIMScheduler)) -- | |
| A scheduler to be used in combination with `unet` to denoise the encoded image latens. Can be one of | |
| [DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) or [DDIMScheduler](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.DDIMScheduler). If any other scheduler is passed it will | |
| automatically be set to [DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler). | |
| - **force_zeros_for_empty_prompt** (`bool`, *optional*, defaults to `"True"`) -- | |
| Whether the negative prompt embeddings shall be forced to always be set to 0. Also see the config of | |
| `stabilityai/stable-diffusion-xl-base-1-0`. | |
| - **add_watermarker** (`bool`, *optional*) -- | |
| Whether to use the [invisible_watermark library](https://github.com/ShieldMnt/invisible-watermark/) to | |
| watermark output images. If not defined, it will default to True if the package is installed, otherwise no | |
| watermarker will be used.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Pipeline for textual image editing using LEDits++ with Stable Diffusion XL. | |
| This model inherits from [DiffusionPipeline](/docs/diffusers/pr_12229/en/api/pipelines/overview#diffusers.DiffusionPipeline) and builds on the [StableDiffusionXLPipeline](/docs/diffusers/pr_12229/en/api/pipelines/stable_diffusion/stable_diffusion_xl#diffusers.StableDiffusionXLPipeline). Check the | |
| superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a | |
| particular device, etc.). | |
| In addition the pipeline inherits the following loading methods: | |
| - *LoRA*: [LEditsPPPipelineStableDiffusionXL.load_lora_weights()](/docs/diffusers/pr_12229/en/api/loaders/lora#diffusers.loaders.StableDiffusionXLLoraLoaderMixin.load_lora_weights) | |
| - *Ckpt*: [loaders.FromSingleFileMixin.from_single_file()](/docs/diffusers/pr_12229/en/api/loaders/single_file#diffusers.loaders.FromSingleFileMixin.from_single_file) | |
| as well as the following saving methods: | |
| - *LoRA*: `loaders.StableDiffusionXLPipeline.save_lora_weights` | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__call__</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L847</source><parameters>[{"name": "denoising_end", "val": ": typing.Optional[float] = None"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "negative_prompt_2", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_pooled_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "ip_adapter_image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None"}, {"name": "output_type", "val": ": typing.Optional[str] = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "guidance_rescale", "val": ": float = 0.0"}, {"name": "crops_coords_top_left", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "target_size", "val": ": typing.Optional[typing.Tuple[int, int]] = None"}, {"name": "editing_prompt", "val": ": typing.Union[str, typing.List[str], NoneType] = None"}, {"name": "editing_prompt_embeddings", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "editing_pooled_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "reverse_editing_direction", "val": ": typing.Union[bool, typing.List[bool], NoneType] = False"}, {"name": "edit_guidance_scale", "val": ": typing.Union[float, typing.List[float], NoneType] = 5"}, {"name": "edit_warmup_steps", "val": ": typing.Union[int, typing.List[int], NoneType] = 0"}, {"name": "edit_cooldown_steps", "val": ": typing.Union[int, typing.List[int], NoneType] = None"}, {"name": "edit_threshold", "val": ": typing.Union[float, typing.List[float], NoneType] = 0.9"}, {"name": "sem_guidance", "val": ": typing.Optional[typing.List[torch.Tensor]] = None"}, {"name": "use_cross_attn_mask", "val": ": bool = False"}, {"name": "use_intersect_mask", "val": ": bool = False"}, {"name": "user_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "attn_store_steps", "val": ": typing.Optional[typing.List[int]] = []"}, {"name": "store_averaged_over_steps", "val": ": bool = True"}, {"name": "clip_skip", "val": ": typing.Optional[int] = None"}, {"name": "callback_on_step_end", "val": ": typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None"}, {"name": "callback_on_step_end_tensor_inputs", "val": ": typing.List[str] = ['latents']"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **denoising_end** (`float`, *optional*) -- | |
| When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be | |
| completed before it is intentionally prematurely terminated. As a result, the returned sample will | |
| still retain a substantial amount of noise as determined by the discrete timesteps selected by the | |
| scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a | |
| "Mixture of Denoisers" multi-pipeline setup, as elaborated in [**Refining the Image | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation. If not defined, one has to pass | |
| `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is | |
| less than `1`). | |
| - **negative_prompt_2** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and | |
| `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input | |
| argument. | |
| - **negative_pooled_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt` | |
| input argument. | |
| - **ip_adapter_image** -- (`PipelineImageInput`, *optional*): | |
| Optional image input to work with IP Adapters. | |
| - **output_type** (`str`, *optional*, defaults to `"pil"`) -- | |
| The output format of the generate image. Choose between | |
| [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a `~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput` instead | |
| of a plain tuple. | |
| - **callback** (`Callable`, *optional*) -- | |
| A function that will be called every `callback_steps` steps during inference. The function will be | |
| called with the following arguments: `callback(step: int, timestep: int, latents: torch.Tensor)`. | |
| - **callback_steps** (`int`, *optional*, defaults to 1) -- | |
| The frequency at which the `callback` function will be called. If not specified, the callback will be | |
| called at every step. | |
| - **cross_attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under | |
| `self.processor` in | |
| [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). | |
| - **guidance_rescale** (`float`, *optional*, defaults to 0.7) -- | |
| Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are | |
| Flawed](https://huggingface.co/papers/2305.08891) `guidance_scale` is defined as `φ` in equation 16. of | |
| [Common Diffusion Noise Schedules and Sample Steps are | |
| Flawed](https://huggingface.co/papers/2305.08891). Guidance rescale factor should fix overexposure when | |
| using zero terminal SNR. | |
| - **crops_coords_top_left** (`Tuple[int]`, *optional*, defaults to (0, 0)) -- | |
| `crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position | |
| `crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting | |
| `crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of | |
| [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). | |
| - **target_size** (`Tuple[int]`, *optional*, defaults to (1024, 1024)) -- | |
| For most cases, `target_size` should be set to the desired height and width of the generated image. If | |
| not specified it will default to `(width, height)`. Part of SDXL's micro-conditioning as explained in | |
| section 2.2 of [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). | |
| - **editing_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide the image generation. The image is reconstructed by setting | |
| `editing_prompt = None`. Guidance direction of prompt should be specified via | |
| `reverse_editing_direction`. | |
| - **editing_prompt_embeddings** (`torch.Tensor`, *optional*) -- | |
| Pre-generated edit text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. | |
| If not provided, editing_prompt_embeddings will be generated from `editing_prompt` input argument. | |
| - **editing_pooled_prompt_embeddings** (`torch.Tensor`, *optional*) -- | |
| Pre-generated pooled edit text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, editing_prompt_embeddings will be generated from `editing_prompt` input | |
| argument. | |
| - **reverse_editing_direction** (`bool` or `List[bool]`, *optional*, defaults to `False`) -- | |
| Whether the corresponding prompt in `editing_prompt` should be increased or decreased. | |
| - **edit_guidance_scale** (`float` or `List[float]`, *optional*, defaults to 5) -- | |
| Guidance scale for guiding the image generation. If provided as list values should correspond to | |
| `editing_prompt`. `edit_guidance_scale` is defined as `s_e` of equation 12 of [LEDITS++ | |
| Paper](https://huggingface.co/papers/2301.12247). | |
| - **edit_warmup_steps** (`float` or `List[float]`, *optional*, defaults to 10) -- | |
| Number of diffusion steps (for each prompt) for which guidance is not applied. | |
| - **edit_cooldown_steps** (`float` or `List[float]`, *optional*, defaults to `None`) -- | |
| Number of diffusion steps (for each prompt) after which guidance is no longer applied. | |
| - **edit_threshold** (`float` or `List[float]`, *optional*, defaults to 0.9) -- | |
| Masking threshold of guidance. Threshold should be proportional to the image region that is modified. | |
| 'edit_threshold' is defined as 'λ' of equation 12 of [LEDITS++ | |
| Paper](https://huggingface.co/papers/2301.12247). | |
| - **sem_guidance** (`List[torch.Tensor]`, *optional*) -- | |
| List of pre-generated guidance vectors to be applied at generation. Length of the list has to | |
| correspond to `num_inference_steps`. | |
| - **use_cross_attn_mask** -- | |
| Whether cross-attention masks are used. Cross-attention masks are always used when use_intersect_mask | |
| is set to true. Cross-attention masks are defined as 'M^1' of equation 12 of [LEDITS++ | |
| paper](https://huggingface.co/papers/2311.16711). | |
| - **use_intersect_mask** -- | |
| Whether the masking term is calculated as intersection of cross-attention masks and masks derived from | |
| the noise estimate. Cross-attention mask are defined as 'M^1' and masks derived from the noise estimate | |
| are defined as 'M^2' of equation 12 of [LEDITS++ paper](https://huggingface.co/papers/2311.16711). | |
| - **user_mask** -- | |
| User-provided mask for even better control over the editing process. This is helpful when LEDITS++'s | |
| implicit masks do not meet user preferences. | |
| - **attn_store_steps** -- | |
| Steps for which the attention maps are stored in the AttentionStore. Just for visualization purposes. | |
| - **store_averaged_over_steps** -- | |
| Whether the attention maps for the 'attn_store_steps' are stored averaged over the diffusion steps. If | |
| False, attention maps for each step are stores separately. Just for visualization purposes. | |
| - **clip_skip** (`int`, *optional*) -- | |
| Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that | |
| the output of the pre-final layer will be used for computing the prompt embeddings. | |
| - **callback_on_step_end** (`Callable`, *optional*) -- | |
| A function that calls at the end of each denoising steps during the inference. The function is called | |
| with the following arguments: `callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, | |
| callback_kwargs: Dict)`. `callback_kwargs` will include a list of all tensors as specified by | |
| `callback_on_step_end_tensor_inputs`. | |
| - **callback_on_step_end_tensor_inputs** (`List`, *optional*) -- | |
| The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list | |
| will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the | |
| `._callback_tensor_inputs` attribute of your pipeline class.</paramsdesc><paramgroups>0</paramgroups><rettype>[LEditsPPDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.pipelines.LEditsPPDiffusionPipelineOutput) or `tuple`</rettype><retdesc>[LEditsPPDiffusionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.pipelines.LEditsPPDiffusionPipelineOutput) if `return_dict` is True, otherwise a `tuple. When | |
| returning a tuple, the first element is a list with the generated images.</retdesc></docstring> | |
| The call function to the pipeline for editing. The | |
| [invert()](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.LEditsPPPipelineStableDiffusionXL.invert) method has to be called beforehand. Edits | |
| will always be performed for the last inverted image(s). | |
| <ExampleCodeBlock anchor="diffusers.LEditsPPPipelineStableDiffusionXL.__call__.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import LEditsPPPipelineStableDiffusionXL | |
| >>> from diffusers.utils import load_image | |
| >>> pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained( | |
| ... "stabilityai/stable-diffusion-xl-base-1.0", variant="fp16", torch_dtype=torch.float16 | |
| ... ) | |
| >>> pipe.enable_vae_tiling() | |
| >>> pipe = pipe.to("cuda") | |
| >>> img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg" | |
| >>> image = load_image(img_url).resize((1024, 1024)) | |
| >>> _ = pipe.invert(image=image, num_inversion_steps=50, skip=0.2) | |
| >>> edited_image = pipe( | |
| ... editing_prompt=["tennis ball", "tomato"], | |
| ... reverse_editing_direction=[True, False], | |
| ... edit_guidance_scale=[5.0, 10.0], | |
| ... edit_threshold=[0.9, 0.85], | |
| ... ).images[0] | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>invert</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.invert</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L1486</source><parameters>[{"name": "image", "val": ": typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]]"}, {"name": "source_prompt", "val": ": str = ''"}, {"name": "source_guidance_scale", "val": " = 3.5"}, {"name": "negative_prompt", "val": ": str = None"}, {"name": "negative_prompt_2", "val": ": str = None"}, {"name": "num_inversion_steps", "val": ": int = 50"}, {"name": "skip", "val": ": float = 0.15"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "crops_coords_top_left", "val": ": typing.Tuple[int, int] = (0, 0)"}, {"name": "num_zero_noise_steps", "val": ": int = 3"}, {"name": "cross_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "height", "val": ": typing.Optional[int] = None"}, {"name": "width", "val": ": typing.Optional[int] = None"}, {"name": "resize_mode", "val": ": typing.Optional[str] = 'default'"}, {"name": "crops_coords", "val": ": typing.Optional[typing.Tuple[int, int, int, int]] = None"}]</parameters><paramsdesc>- **image** (`PipelineImageInput`) -- | |
| Input for the image(s) that are to be edited. Multiple input images have to default to the same aspect | |
| ratio. | |
| - **source_prompt** (`str`, defaults to `""`) -- | |
| Prompt describing the input image that will be used for guidance during inversion. Guidance is disabled | |
| if the `source_prompt` is `""`. | |
| - **source_guidance_scale** (`float`, defaults to `3.5`) -- | |
| Strength of guidance during inversion. | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation. If not defined, one has to pass | |
| `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is | |
| less than `1`). | |
| - **negative_prompt_2** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and | |
| `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders | |
| - **num_inversion_steps** (`int`, defaults to `50`) -- | |
| Number of total performed inversion steps after discarding the initial `skip` steps. | |
| - **skip** (`float`, defaults to `0.15`) -- | |
| Portion of initial steps that will be ignored for inversion and subsequent generation. Lower values | |
| will lead to stronger changes to the input image. `skip` has to be between `0` and `1`. | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) to make inversion | |
| deterministic. | |
| - **crops_coords_top_left** (`Tuple[int]`, *optional*, defaults to (0, 0)) -- | |
| `crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position | |
| `crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting | |
| `crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of | |
| [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). | |
| - **num_zero_noise_steps** (`int`, defaults to `3`) -- | |
| Number of final diffusion steps that will not renoise the current image. If no steps are set to zero | |
| SD-XL in combination with [DPMSolverMultistepScheduler](/docs/diffusers/pr_12229/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) will produce noise artifacts. | |
| - **cross_attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under | |
| `self.processor` in | |
| [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).</paramsdesc><paramgroups>0</paramgroups><rettype>[LEditsPPInversionPipelineOutput](/docs/diffusers/pr_12229/en/api/pipelines/ledits_pp#diffusers.pipelines.LEditsPPInversionPipelineOutput)</rettype><retdesc>Output will contain the resized input image(s) | |
| and respective VAE reconstruction(s).</retdesc></docstring> | |
| The function to the pipeline for image inversion as described by the [LEDITS++ | |
| Paper](https://huggingface.co/papers/2301.12247). If the scheduler is set to [DDIMScheduler](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.DDIMScheduler) the | |
| inversion proposed by [edit-friendly DPDM](https://huggingface.co/papers/2304.06140) will be performed instead. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_vae_slicing</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.disable_vae_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L782</source><parameters>[]</parameters></docstring> | |
| Disable sliced VAE decoding. If `enable_vae_slicing` was previously enabled, this method will go back to | |
| computing decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_vae_tiling</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.disable_vae_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L809</source><parameters>[]</parameters></docstring> | |
| Disable tiled VAE decoding. If `enable_vae_tiling` was previously enabled, this method will go back to | |
| computing decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_vae_slicing</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.enable_vae_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L769</source><parameters>[]</parameters></docstring> | |
| Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | |
| compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_vae_tiling</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.enable_vae_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L795</source><parameters>[]</parameters></docstring> | |
| Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | |
| compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | |
| processing larger images. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>encode_prompt</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.encode_prompt</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L402</source><parameters>[{"name": "device", "val": ": typing.Optional[torch.device] = None"}, {"name": "num_images_per_prompt", "val": ": int = 1"}, {"name": "negative_prompt", "val": ": typing.Optional[str] = None"}, {"name": "negative_prompt_2", "val": ": typing.Optional[str] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_pooled_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "lora_scale", "val": ": typing.Optional[float] = None"}, {"name": "clip_skip", "val": ": typing.Optional[int] = None"}, {"name": "enable_edit_guidance", "val": ": bool = True"}, {"name": "editing_prompt", "val": ": typing.Optional[str] = None"}, {"name": "editing_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "editing_pooled_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}]</parameters><paramsdesc>- **device** -- (`torch.device`): | |
| torch device | |
| - **num_images_per_prompt** (`int`) -- | |
| number of images that should be generated per prompt | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation. If not defined, one has to pass | |
| `negative_prompt_embeds` instead. | |
| - **negative_prompt_2** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and | |
| `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input | |
| argument. | |
| - **negative_pooled_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt` | |
| input argument. | |
| - **lora_scale** (`float`, *optional*) -- | |
| A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded. | |
| - **clip_skip** (`int`, *optional*) -- | |
| Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that | |
| the output of the pre-final layer will be used for computing the prompt embeddings. | |
| - **enable_edit_guidance** (`bool`) -- | |
| Whether to guide towards an editing prompt or not. | |
| - **editing_prompt** (`str` or `List[str]`, *optional*) -- | |
| Editing prompt(s) to be encoded. If not defined and 'enable_edit_guidance' is True, one has to pass | |
| `editing_prompt_embeds` instead. | |
| - **editing_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated edit text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. | |
| If not provided and 'enable_edit_guidance' is True, editing_prompt_embeds will be generated from | |
| `editing_prompt` input argument. | |
| - **editing_pooled_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated edit pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, pooled editing_pooled_prompt_embeds will be generated from `editing_prompt` | |
| input argument.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Encodes the prompt into text encoder hidden states. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_guidance_scale_embedding</name><anchor>diffusers.LEditsPPPipelineStableDiffusionXL.get_guidance_scale_embedding</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_leditspp_stable_diffusion_xl.py#L708</source><parameters>[{"name": "w", "val": ": Tensor"}, {"name": "embedding_dim", "val": ": int = 512"}, {"name": "dtype", "val": ": dtype = torch.float32"}]</parameters><paramsdesc>- **w** (`torch.Tensor`) -- | |
| Generate embedding vectors with a specified guidance scale to subsequently enrich timestep embeddings. | |
| - **embedding_dim** (`int`, *optional*, defaults to 512) -- | |
| Dimension of the embeddings to generate. | |
| - **dtype** (`torch.dtype`, *optional*, defaults to `torch.float32`) -- | |
| Data type of the generated embeddings.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>Embedding vectors with shape `(len(w), embedding_dim)`.</retdesc></docstring> | |
| See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298 | |
| </div></div> | |
| ## LEditsPPDiffusionPipelineOutput[[diffusers.pipelines.LEditsPPDiffusionPipelineOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.pipelines.LEditsPPDiffusionPipelineOutput</name><anchor>diffusers.pipelines.LEditsPPDiffusionPipelineOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_output.py#L11</source><parameters>[{"name": "images", "val": ": typing.Union[typing.List[PIL.Image.Image], numpy.ndarray]"}, {"name": "nsfw_content_detected", "val": ": typing.Optional[typing.List[bool]]"}]</parameters><paramsdesc>- **images** (`List[PIL.Image.Image]` or `np.ndarray`) -- | |
| List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width, | |
| num_channels)`. | |
| - **nsfw_content_detected** (`List[bool]`) -- | |
| List indicating whether the corresponding generated image contains “not-safe-for-work” (nsfw) content or | |
| `None` if safety checking could not be performed.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output class for LEdits++ Diffusion pipelines. | |
| </div> | |
| ## LEditsPPInversionPipelineOutput[[diffusers.pipelines.LEditsPPInversionPipelineOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.pipelines.LEditsPPInversionPipelineOutput</name><anchor>diffusers.pipelines.LEditsPPInversionPipelineOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/pipelines/ledits_pp/pipeline_output.py#L29</source><parameters>[{"name": "images", "val": ": typing.Union[typing.List[PIL.Image.Image], numpy.ndarray]"}, {"name": "vae_reconstruction_images", "val": ": typing.Union[typing.List[PIL.Image.Image], numpy.ndarray]"}]</parameters><paramsdesc>- **input_images** (`List[PIL.Image.Image]` or `np.ndarray`) -- | |
| List of the cropped and resized input images as PIL images of length `batch_size` or NumPy array of shape ` | |
| (batch_size, height, width, num_channels)`. | |
| - **vae_reconstruction_images** (`List[PIL.Image.Image]` or `np.ndarray`) -- | |
| List of VAE reconstruction of all input images as PIL images of length `batch_size` or NumPy array of shape | |
| ` (batch_size, height, width, num_channels)`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output class for LEdits++ Diffusion pipelines. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/ledits_pp.md" /> |
Xet Storage Details
- Size:
- 59.4 kB
- Xet hash:
- d304aaef1dcc5b0304b7425b7f0cf9d7a2d0cc08080f41fdc0bdad3fc4994687
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.