This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
Pipeline Type: StableDiffusionXLAutoBlocks
Description: Auto Modular pipeline for text-to-image, image-to-image, inpainting, and controlnet tasks using Stable Diffusion XL.
This pipeline uses a 5-block architecture that can be customized and extended.
Example Usage
[TODO]
Pipeline Architecture
This modular pipeline is composed of the following blocks:
- text_encoder (
StableDiffusionXLTextEncoderStep)- Text Encoder step that generate text_embeddings to guide the image generation
- ip_adapter (
StableDiffusionXLAutoIPAdapterStep)- Run IP Adapter step if
ip_adapter_imageis provided. This step should be placed before the 'input' step. - ip_adapter:
StableDiffusionXLIPAdapterStep- IP Adapter step that prepares ip adapter image embeddings.
- Run IP Adapter step if
- vae_encoder (
StableDiffusionXLAutoVaeEncoderStep)- Vae encoder step that encode the image inputs into their latent representations.
- inpaint:
StableDiffusionXLInpaintVaeEncoderStep- Vae encoder step that prepares the image and mask for the inpainting process
- img2img:
StableDiffusionXLVaeEncoderStep- Vae Encoder step that encode the input image into a latent representation
- denoise (
StableDiffusionXLCoreDenoiseStep)- Core step that performs the denoising process.
- input:
StableDiffusionXLInputStep- Input processing step that:
- before_denoise:
StableDiffusionXLAutoBeforeDenoiseStep- Before denoise step that prepare the inputs for the denoise step.
- controlnet_input:
StableDiffusionXLAutoControlNetInputStep- Controlnet Input step that prepare the controlnet input.
- denoise:
StableDiffusionXLAutoDenoiseStep- Denoise step that iteratively denoise the latents. This is a auto pipeline block that works for text2img, img2img and inpainting tasks. And can be used with or without controlnet. -
StableDiffusionXLAutoControlNetDenoiseStep(controlnet_denoise) is used when controlnet_cond is provided (support controlnet withtext2img, img2img and inpainting tasks). -StableDiffusionXLInpaintDenoiseStep(inpaint_denoise) is used when mask is provided (support inpainting tasks). -StableDiffusionXLDenoiseStep(denoise) is used when neither mask nor controlnet_cond are provided (support text2img and img2img tasks).
- Denoise step that iteratively denoise the latents. This is a auto pipeline block that works for text2img, img2img and inpainting tasks. And can be used with or without controlnet. -
- decode (
StableDiffusionXLAutoDecodeStep)- Decode step that decode the denoised latents into images outputs.
- inpaint:
StableDiffusionXLInpaintDecodeStep- Inpaint decode step that decode the denoised latents into images outputs.
- non-inpaint:
StableDiffusionXLDecodeStep- Step that decodes the denoised latents into images
Model Components
- text_encoder (
CLIPTextModel) - text_encoder_2 (
CLIPTextModelWithProjection) - tokenizer (
CLIPTokenizer) - tokenizer_2 (
CLIPTokenizer) - guider (
ClassifierFreeGuidance) - image_encoder (
CLIPVisionModelWithProjection) - feature_extractor (
CLIPImageProcessor) - unet (
UNet2DConditionModel) - vae (
AutoencoderKL) - image_processor (
VaeImageProcessor) - mask_processor (
VaeImageProcessor) - scheduler (
EulerDiscreteScheduler) - controlnet (
ControlNetUnionModel) - control_image_processor (
VaeImageProcessor)
Configuration Parameters
force_zeros_for_empty_prompt (default: True) requires_aesthetics_score (default: False)
Input/Output Specification
Inputs Required:
latents(Any): No description provided
Optional:
prompt(Any): No description providedprompt_2(Any): No description providednegative_prompt(Any): No description providednegative_prompt_2(Any): No description providedcross_attention_kwargs(Any): No description providedclip_skip(Any): No description providedip_adapter_image(PIL.Image.Image | numpy.ndarray | torch.Tensor | list[PIL.Image.Image] | list[numpy.ndarray] | list[torch.Tensor]): The image(s) to be used as ip adapterheight(Any): No description providedwidth(Any): No description providedimage(Any): No description providedmask_image(Any): No description providedpadding_mask_crop(Any): No description provideddtype(dtype): The dtype of the model inputsgenerator(Any): No description providedpreprocess_kwargs(dict | None): A kwargs dictionary that if specified is passed along to theImageProcessoras defined underself.image_processorin [diffusers.image_processor.VaeImageProcessor]num_images_per_prompt(Any), default:1: No description providedip_adapter_embeds(list): Pre-generated image embeddings for IP-Adapter. Can be generated from ip_adapter step.negative_ip_adapter_embeds(list): Pre-generated negative image embeddings for IP-Adapter. Can be generated from ip_adapter step.num_inference_steps(Any), default:50: No description providedtimesteps(Any): No description providedsigmas(Any): No description provideddenoising_end(Any): No description providedstrength(Any), default:0.3: No description provideddenoising_start(Any): No description providedimage_latents(Tensor): The latents representing the reference image for image-to-image/inpainting generation. Can be generated in vae_encode step.mask(Tensor): The mask for the inpainting generation. Can be generated in vae_encode step.masked_image_latents(Tensor): The masked image latents for the inpainting generation (only for inpainting-specific unet). Can be generated in vae_encode step.original_size(Any): No description providedtarget_size(Any): No description providednegative_original_size(Any): No description providednegative_target_size(Any): No description providedcrops_coords_top_left(Any), default:(0, 0): No description providednegative_crops_coords_top_left(Any), default:(0, 0): No description providedaesthetic_score(Any), default:6.0: No description providednegative_aesthetic_score(Any), default:2.0: No description providedcontrol_image(Any): No description providedcontrol_mode(Any): No description providedcontrol_guidance_start(Any), default:0.0: No description providedcontrol_guidance_end(Any), default:1.0: No description providedcontrolnet_conditioning_scale(Any), default:1.0: No description providedguess_mode(Any), default:False: No description providedcrops_coords(tuple[int] | None): The crop coordinates to use for preprocess/postprocess the image and mask, for inpainting task only. Can be generated in vae_encode step.controlnet_cond(Tensor): The control image to use for the denoising process. Can be generated in prepare_controlnet_inputs step.conditioning_scale(float): The controlnet conditioning scale value to use for the denoising process. Can be generated in prepare_controlnet_inputs step.controlnet_keep(list): The controlnet keep values to use for the denoising process. Can be generated in prepare_controlnet_inputs step.None(Any): All conditional model inputs that need to be prepared with guider. It should contain prompt_embeds/negative_prompt_embeds, add_time_ids/negative_add_time_ids, pooled_prompt_embeds/negative_pooled_prompt_embeds, and ip_adapter_embeds/negative_ip_adapter_embeds (optional).please addkwargs_type=denoiser_input_fieldsto their parameter spec (OutputParam) when they are created and added to the pipeline stateeta(Any), default:0.0: No description providedoutput_type(Any), default:pil: No description provided
Outputs - images (list): Generated images.
- Downloads last month
- -