Buckets:
ZImageTransformer2DModel
A Transformer model for image-like data from Z-Image.
ZImageTransformer2DModel[[diffusers.ZImageTransformer2DModel]]
diffusers.ZImageTransformer2DModel[[diffusers.ZImageTransformer2DModel]]
forwarddiffusers.ZImageTransformer2DModel.forwardhttps://github.com/huggingface/diffusers/blob/vr_13751/src/diffusers/models/transformers/transformer_z_image.py#L894[{"name": "x", "val": ": list"}, {"name": "t", "val": ""}, {"name": "cap_feats", "val": ": list"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "controlnet_block_samples", "val": ": dict[int, torch.Tensor] | None = None"}, {"name": "siglip_feats", "val": ": list[list[torch.Tensor]] | None = None"}, {"name": "image_noise_mask", "val": ": list[list[int]] | None = None"}, {"name": "patch_size", "val": ": int = 2"}, {"name": "f_patch_size", "val": ": int = 1"}]- x (list of torch.Tensor or nested list of torch.Tensor) --
Input latents. A flat list when running in standard mode, or a nested list when running in omni mode.
- t (
torch.Tensor) -- Used to indicate denoising step. - cap_feats (
listoftorch.Tensoror nestedlistoftorch.Tensor) -- Conditional caption embeddings (embeddings computed from the input conditions such as prompts) to use. - return_dict (
bool, optional, defaults toTrue) -- Whether or not to return a~models.transformer_2d.Transformer2DModelOutputinstead of a plain tuple. - controlnet_block_samples (
dictofinttotorch.Tensor, optional) -- A mapping from block index to tensor that if specified are added to the residuals of transformer blocks. - siglip_feats (
listoflistoftorch.Tensor, optional) -- Optional SigLIP image features used as additional conditioning. - image_noise_mask (
listoflistofint, optional) -- Per-image noise masks indicating noisy vs. clean tokens in omni mode. - patch_size (
int, optional, defaults to 2) -- Spatial patch size used to patchify the input latents. - f_patch_size (
int, optional, defaults to 1) -- Temporal patch size used to patchify the input latents.0
The ZImageTransformer2DModel forward method.
Flow: patchify -> t_embed -> x_embed -> x_refine -> cap_embed -> cap_refine -> [siglip_embed -> siglip_refine] -> build_unified -> main_layers -> final_layer -> unpatchify
Parameters:
x (list of torch.Tensor or nested list of torch.Tensor) : Input latents. A flat list when running in standard mode, or a nested list when running in omni mode.
t (torch.Tensor) : Used to indicate denoising step.
cap_feats (list of torch.Tensor or nested list of torch.Tensor) : Conditional caption embeddings (embeddings computed from the input conditions such as prompts) to use.
return_dict (bool, optional, defaults to True) : Whether or not to return a ~models.transformer_2d.Transformer2DModelOutput instead of a plain tuple.
controlnet_block_samples (dict of int to torch.Tensor, optional) : A mapping from block index to tensor that if specified are added to the residuals of transformer blocks.
siglip_feats (list of list of torch.Tensor, optional) : Optional SigLIP image features used as additional conditioning.
image_noise_mask (list of list of int, optional) : Per-image noise masks indicating noisy vs. clean tokens in omni mode.
patch_size (int, optional, defaults to 2) : Spatial patch size used to patchify the input latents.
f_patch_size (int, optional, defaults to 1) : Temporal patch size used to patchify the input latents.
patchify_and_embed[[diffusers.ZImageTransformer2DModel.patchify_and_embed]]
Patchify for basic mode: single image per batch item.
patchify_and_embed_omni[[diffusers.ZImageTransformer2DModel.patchify_and_embed_omni]]
Patchify for omni mode: multiple images per batch item with noise masks.
Xet Storage Details
- Size:
- 4.44 kB
- Xet hash:
- 4a10768d5b9a3a7e5f9874ae826c9d182fda6ee0e6afcb892621f288b330569c
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.