This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
Pipeline Type: SequentialPipelineBlocks
Description:
This pipeline uses a 9-block architecture that can be customized and extended.
Example Usage
[TODO]
Pipeline Architecture
This modular pipeline is composed of the following blocks:
- depth (
DepthProcessorBlock) - text_encoder (
StableDiffusionXLTextEncoderStep)- Text Encoder step that generate text_embeddings to guide the image generation
- denoise.input (
StableDiffusionXLInputStep)- Input processing step that:
- denoise.before_denoise.set_timesteps (
StableDiffusionXLSetTimestepsStep)- Step that sets the scheduler's timesteps for inference
- denoise.before_denoise.prepare_latents (
StableDiffusionXLPrepareLatentsStep)- Prepare latents step that prepares the latents for the text-to-image generation process
- denoise.before_denoise.prepare_add_cond (
StableDiffusionXLPrepareAdditionalConditioningStep)- Step that prepares the additional conditioning for the text-to-image generation process
- denoise.controlnet_input (
StableDiffusionXLControlNetInputStep)- step that prepare inputs for controlnet
- denoise.denoise (
StableDiffusionXLControlNetDenoiseStep)- Denoise step that iteratively denoise the latents with controlnet.
- before_denoiser:
StableDiffusionXLLoopBeforeDenoiser- step within the denoising loop that prepare the latent input for the denoiser. This block should be used to compose the
sub_blocksattribute of aLoopSequentialPipelineBlocksobject (e.g.StableDiffusionXLDenoiseLoopWrapper)
- step within the denoising loop that prepare the latent input for the denoiser. This block should be used to compose the
- denoiser:
StableDiffusionXLControlNetLoopDenoiser- step within the denoising loop that denoise the latents with guidance (with controlnet). This block should be used to compose the
sub_blocksattribute of aLoopSequentialPipelineBlocksobject (e.g.StableDiffusionXLDenoiseLoopWrapper)
- step within the denoising loop that denoise the latents with guidance (with controlnet). This block should be used to compose the
- after_denoiser:
StableDiffusionXLLoopAfterDenoiser- step within the denoising loop that update the latents. This block should be used to compose the
sub_blocksattribute of aLoopSequentialPipelineBlocksobject (e.g.StableDiffusionXLDenoiseLoopWrapper)
- step within the denoising loop that update the latents. This block should be used to compose the
- decode (
StableDiffusionXLDecodeStep)- Step that decodes the denoised latents into images
Model Components
- depth_processor (
DepthPreprocessor) [pretrained_model_name_or_path=depth-anything/Depth-Anything-V2-Large-hf] - text_encoder (
CLIPTextModel) - text_encoder_2 (
CLIPTextModelWithProjection) - tokenizer (
CLIPTokenizer) - tokenizer_2 (
CLIPTokenizer) - guider (
ClassifierFreeGuidance) - scheduler (
EulerDiscreteScheduler) - vae (
AutoencoderKL) - unet (
UNet2DConditionModel) - controlnet (
ControlNetModel) - control_image_processor (
VaeImageProcessor) - image_processor (
VaeImageProcessor)
Configuration Parameters
force_zeros_for_empty_prompt (default: True)
Input/Output Specification
Inputs Required:
image(Any): Image(s) to use to extract depth mapscontrol_image(Any): No description provided
Optional:
prompt(Any): No description providedprompt_2(Any): No description providednegative_prompt(Any): No description providednegative_prompt_2(Any): No description providedcross_attention_kwargs(Any): No description providedclip_skip(Any): No description providednum_images_per_prompt(Any), default:1: No description providedip_adapter_embeds(list): Pre-generated image embeddings for IP-Adapter. Can be generated from ip_adapter step.negative_ip_adapter_embeds(list): Pre-generated negative image embeddings for IP-Adapter. Can be generated from ip_adapter step.num_inference_steps(Any), default:50: No description providedtimesteps(Any): No description providedsigmas(Any): No description provideddenoising_end(Any): No description providedheight(Any): No description providedwidth(Any): No description providedlatents(Any): No description providedgenerator(Any): No description providedoriginal_size(Any): No description providedtarget_size(Any): No description providednegative_original_size(Any): No description providednegative_target_size(Any): No description providedcrops_coords_top_left(Any), default:(0, 0): No description providednegative_crops_coords_top_left(Any), default:(0, 0): No description providedcontrol_guidance_start(Any), default:0.0: No description providedcontrol_guidance_end(Any), default:1.0: No description providedcontrolnet_conditioning_scale(Any), default:1.0: No description providedguess_mode(Any), default:False: No description providedcrops_coords(tuple[int] | None): The crop coordinates to use for preprocess/postprocess the image and mask, for inpainting task only. Can be generated in vae_encode step.None(Any): All conditional model inputs that need to be prepared with guider. It should contain prompt_embeds/negative_prompt_embeds, add_time_ids/negative_add_time_ids, pooled_prompt_embeds/negative_pooled_prompt_embeds, and ip_adapter_embeds/negative_ip_adapter_embeds (optional).please addkwargs_type=denoiser_input_fieldsto their parameter spec (OutputParam) when they are created and added to the pipeline stateeta(Any), default:0.0: No description providedoutput_type(Any), default:pil: No description provided
Outputs - prompt_embeds (Tensor): text embeddings used to guide the image generation
negative_prompt_embeds(Tensor): negative text embeddings used to guide the image generationpooled_prompt_embeds(Tensor): pooled text embeddings used to guide the image generationnegative_pooled_prompt_embeds(Tensor): negative pooled text embeddings used to guide the image generationbatch_size(int): Number of prompts, the final batch size of model inputs should be batch_size * num_images_per_promptdtype(dtype): Data type of model tensor inputs (determined byprompt_embeds)ip_adapter_embeds(list): image embeddings for IP-Adapternegative_ip_adapter_embeds(list): negative image embeddings for IP-Adaptertimesteps(Tensor): The timesteps to use for inferencenum_inference_steps(int): The number of denoising steps to perform at inference timelatents(Tensor): The initial latents to use for the denoising processadd_time_ids(Tensor): The time ids to condition the denoising processnegative_add_time_ids(Tensor): The negative time ids to condition the denoising processtimestep_cond(Tensor): The timestep cond to use for LCMcontrolnet_cond(Tensor): The processed control imagecontrol_guidance_start(list): The controlnet guidance start valuescontrol_guidance_end(list): The controlnet guidance end valuesconditioning_scale(list): The controlnet conditioning scale valuesguess_mode(bool): Whether guess mode is usedcontrolnet_keep(list): The controlnet keep valuesimages(list[PIL.Image.Image] | list[torch.Tensor] | list[numpy.array]): The generated images, can be a PIL.Image.Image, torch.Tensor or a numpy array
- Downloads last month
- -