Buckets:

hf-doc-build/doc-dev / diffusers /pr_12652 /en /api /models /hunyuan_transformer2d.md
rtrm's picture
|
download
raw
5.63 kB

HunyuanDiT2DModel

A Diffusion Transformer model for 2D data from Hunyuan-DiT.

HunyuanDiT2DModel[[diffusers.HunyuanDiT2DModel]]

diffusers.HunyuanDiT2DModel[[diffusers.HunyuanDiT2DModel]]

Source

HunYuanDiT: Diffusion model with a Transformer backbone.

Inherit ModelMixin and ConfigMixin to be compatible with the sampler StableDiffusionPipeline of diffusers.

enable_forward_chunkingdiffusers.HunyuanDiT2DModel.enable_forward_chunkinghttps://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/transformers/hunyuan_transformer_2d.py#L470[{"name": "chunk_size", "val": ": int | None = None"}, {"name": "dim", "val": ": int = 0"}]- chunk_size (int, optional) -- The chunk size of the feed-forward layers. If not specified, will run feed-forward layer individually over each tensor of dim=dim.

  • dim (int, optional, defaults to 0) -- The dimension over which the feed-forward computation should be chunked. Choose between dim=0 (batch) or dim=1 (sequence length).0

Sets the attention processor to use feed forward chunking.

Parameters:

num_attention_heads (int, optional, defaults to 16) : The number of heads to use for multi-head attention.

attention_head_dim (int, optional, defaults to 88) : The number of channels in each head.

in_channels (int, optional) : The number of channels in the input and output (specify if the input is continuous).

patch_size (int, optional) : The size of the patch to use for the input.

activation_fn (str, optional, defaults to "geglu") : Activation function to use in feed-forward.

sample_size (int, optional) : The width of the latent images. This is fixed during training since it is used to learn a number of position embeddings.

dropout (float, optional, defaults to 0.0) : The dropout probability to use.

cross_attention_dim (int, optional) : The number of dimension in the clip text embedding.

hidden_size (int, optional) : The size of hidden layer in the conditioning embedding layers.

num_layers (int, optional, defaults to 1) : The number of layers of Transformer blocks to use.

mlp_ratio (float, optional, defaults to 4.0) : The ratio of the hidden layer size to the input size.

learn_sigma (bool, optional, defaults to True) : Whether to predict variance.

cross_attention_dim_t5 (int, optional) : The number dimensions in t5 text embedding.

pooled_projection_dim (int, optional) : The size of the pooled projection.

text_len (int, optional) : The length of the clip text embedding.

text_len_t5 (int, optional) : The length of the T5 text embedding.

use_style_cond_and_image_meta_size (bool, optional) : Whether or not to use style condition and image meta size. True for version = 1.2

forward[[diffusers.HunyuanDiT2DModel.forward]]

Source

The HunyuanDiT2DModel forward method.

Parameters:

hidden_states (torch.Tensor of shape (batch size, dim, height, width)) : The input tensor.

timestep ( torch.LongTensor, optional) : Used to indicate denoising step.

encoder_hidden_states ( torch.Tensor of shape (batch size, sequence len, embed dims), optional) : Conditional embeddings for cross attention layer. This is the output of BertModel.

text_embedding_mask : torch.Tensor An attention mask of shape (batch, key_tokens) is applied to encoder_hidden_states. This is the output of BertModel.

encoder_hidden_states_t5 ( torch.Tensor of shape (batch size, sequence len, embed dims), optional) : Conditional embeddings for cross attention layer. This is the output of T5 Text Encoder.

text_embedding_mask_t5 : torch.Tensor An attention mask of shape (batch, key_tokens) is applied to encoder_hidden_states. This is the output of T5 Text Encoder.

image_meta_size (torch.Tensor) : Conditional embedding indicate the image sizes

style : torch.Tensor: Conditional embedding indicate the style

image_rotary_emb (torch.Tensor) : The image rotary embeddings to apply on query and key tensors during attention calculation.

return_dict : bool Whether to return a dictionary.

fuse_qkv_projections[[diffusers.HunyuanDiT2DModel.fuse_qkv_projections]]

Source

Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value) are fused. For cross-attention modules, key and value projection matrices are fused.

> This API is 🧪 experimental.

set_default_attn_processor[[diffusers.HunyuanDiT2DModel.set_default_attn_processor]]

Source

Disables custom attention processors and sets the default attention implementation.

unfuse_qkv_projections[[diffusers.HunyuanDiT2DModel.unfuse_qkv_projections]]

Source

Disables the fused QKV projection if enabled.

> This API is 🧪 experimental.

Xet Storage Details

Size:
5.63 kB
·
Xet hash:
0d06925aa649a34ec483f82e9da27b8eee9fe1d20bf4b6da7e3bd736602014ac

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.