This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

Pipeline Type: SequentialPipelineBlocks

Description:

This pipeline uses a 9-block architecture that can be customized and extended.

Example Usage

[TODO]

Pipeline Architecture

This modular pipeline is composed of the following blocks:

depth (DepthProcessorBlock)
text_encoder (StableDiffusionXLTextEncoderStep)
- Text Encoder step that generate text_embeddings to guide the image generation
denoise.input (StableDiffusionXLInputStep)
- Input processing step that:
denoise.before_denoise.set_timesteps (StableDiffusionXLSetTimestepsStep)
- Step that sets the scheduler's timesteps for inference
denoise.before_denoise.prepare_latents (StableDiffusionXLPrepareLatentsStep)
- Prepare latents step that prepares the latents for the text-to-image generation process
denoise.before_denoise.prepare_add_cond (StableDiffusionXLPrepareAdditionalConditioningStep)
- Step that prepares the additional conditioning for the text-to-image generation process
denoise.controlnet_input (StableDiffusionXLControlNetInputStep)
- step that prepare inputs for controlnet
denoise.denoise (StableDiffusionXLControlNetDenoiseStep)
- Denoise step that iteratively denoise the latents with controlnet.
- before_denoiser: StableDiffusionXLLoopBeforeDenoiser
  - step within the denoising loop that prepare the latent input for the denoiser. This block should be used to compose the sub_blocks attribute of a LoopSequentialPipelineBlocks object (e.g. StableDiffusionXLDenoiseLoopWrapper)
- denoiser: StableDiffusionXLControlNetLoopDenoiser
  - step within the denoising loop that denoise the latents with guidance (with controlnet). This block should be used to compose the sub_blocks attribute of a LoopSequentialPipelineBlocks object (e.g. StableDiffusionXLDenoiseLoopWrapper)
- after_denoiser: StableDiffusionXLLoopAfterDenoiser
  - step within the denoising loop that update the latents. This block should be used to compose the sub_blocks attribute of a LoopSequentialPipelineBlocks object (e.g. StableDiffusionXLDenoiseLoopWrapper)
decode (StableDiffusionXLDecodeStep)
- Step that decodes the denoised latents into images

Model Components

depth_processor (DepthPreprocessor) [pretrained_model_name_or_path=depth-anything/Depth-Anything-V2-Large-hf]
text_encoder (CLIPTextModel)
text_encoder_2 (CLIPTextModelWithProjection)
tokenizer (CLIPTokenizer)
tokenizer_2 (CLIPTokenizer)
guider (ClassifierFreeGuidance)
scheduler (EulerDiscreteScheduler)
vae (AutoencoderKL)
unet (UNet2DConditionModel)
controlnet (ControlNetModel)
control_image_processor (VaeImageProcessor)
image_processor (VaeImageProcessor)

Configuration Parameters

force_zeros_for_empty_prompt (default: True)

Input/Output Specification

Inputs Required:

image (Any): Image(s) to use to extract depth maps
control_image (Any): No description provided

Optional:

prompt (Any): No description provided
prompt_2 (Any): No description provided
negative_prompt (Any): No description provided
negative_prompt_2 (Any): No description provided
cross_attention_kwargs (Any): No description provided
clip_skip (Any): No description provided
num_images_per_prompt (Any), default: 1: No description provided
ip_adapter_embeds (list): Pre-generated image embeddings for IP-Adapter. Can be generated from ip_adapter step.
negative_ip_adapter_embeds (list): Pre-generated negative image embeddings for IP-Adapter. Can be generated from ip_adapter step.
num_inference_steps (Any), default: 50: No description provided
timesteps (Any): No description provided
sigmas (Any): No description provided
denoising_end (Any): No description provided
height (Any): No description provided
width (Any): No description provided
latents (Any): No description provided
generator (Any): No description provided
original_size (Any): No description provided
target_size (Any): No description provided
negative_original_size (Any): No description provided
negative_target_size (Any): No description provided
crops_coords_top_left (Any), default: (0, 0): No description provided
negative_crops_coords_top_left (Any), default: (0, 0): No description provided
control_guidance_start (Any), default: 0.0: No description provided
control_guidance_end (Any), default: 1.0: No description provided
controlnet_conditioning_scale (Any), default: 1.0: No description provided
guess_mode (Any), default: False: No description provided
crops_coords (tuple[int] | None): The crop coordinates to use for preprocess/postprocess the image and mask, for inpainting task only. Can be generated in vae_encode step.
None (Any): All conditional model inputs that need to be prepared with guider. It should contain prompt_embeds/negative_prompt_embeds, add_time_ids/negative_add_time_ids, pooled_prompt_embeds/negative_pooled_prompt_embeds, and ip_adapter_embeds/negative_ip_adapter_embeds (optional).please add kwargs_type=denoiser_input_fields to their parameter spec (OutputParam) when they are created and added to the pipeline state
eta (Any), default: 0.0: No description provided
output_type (Any), default: pil: No description provided

Outputs - `prompt_embeds` (`Tensor`): text embeddings used to guide the image generation

negative_prompt_embeds (Tensor): negative text embeddings used to guide the image generation
pooled_prompt_embeds (Tensor): pooled text embeddings used to guide the image generation
negative_pooled_prompt_embeds (Tensor): negative pooled text embeddings used to guide the image generation
batch_size (int): Number of prompts, the final batch size of model inputs should be batch_size * num_images_per_prompt
dtype (dtype): Data type of model tensor inputs (determined by prompt_embeds)
ip_adapter_embeds (list): image embeddings for IP-Adapter
negative_ip_adapter_embeds (list): negative image embeddings for IP-Adapter
timesteps (Tensor): The timesteps to use for inference
num_inference_steps (int): The number of denoising steps to perform at inference time
latents (Tensor): The initial latents to use for the denoising process
add_time_ids (Tensor): The time ids to condition the denoising process
negative_add_time_ids (Tensor): The negative time ids to condition the denoising process
timestep_cond (Tensor): The timestep cond to use for LCM
controlnet_cond (Tensor): The processed control image
control_guidance_start (list): The controlnet guidance start values
control_guidance_end (list): The controlnet guidance end values
conditioning_scale (list): The controlnet conditioning scale values
guess_mode (bool): Whether guess mode is used
controlnet_keep (list): The controlnet keep values
images (list[PIL.Image.Image] | list[torch.Tensor] | list[numpy.array]): The generated images, can be a PIL.Image.Image, torch.Tensor or a numpy array

Downloads last month: -