Buckets:
HunyuanDiT2DModel
A Diffusion Transformer model for 2D data from Hunyuan-DiT.
HunyuanDiT2DModel[[diffusers.HunyuanDiT2DModel]]
diffusers.HunyuanDiT2DModel[[diffusers.HunyuanDiT2DModel]]
HunYuanDiT: Diffusion model with a Transformer backbone.
Inherit ModelMixin and ConfigMixin to be compatible with the sampler StableDiffusionPipeline of diffusers.
enable_forward_chunkingdiffusers.HunyuanDiT2DModel.enable_forward_chunkinghttps://github.com/huggingface/diffusers/blob/vr_12652/src/diffusers/models/transformers/hunyuan_transformer_2d.py#L470[{"name": "chunk_size", "val": ": int | None = None"}, {"name": "dim", "val": ": int = 0"}]- chunk_size (int, optional) --
The chunk size of the feed-forward layers. If not specified, will run feed-forward layer individually
over each tensor of dim=dim.
- dim (
int, optional, defaults to0) -- The dimension over which the feed-forward computation should be chunked. Choose between dim=0 (batch) or dim=1 (sequence length).0
Sets the attention processor to use feed forward chunking.
Parameters:
num_attention_heads (int, optional, defaults to 16) : The number of heads to use for multi-head attention.
attention_head_dim (int, optional, defaults to 88) : The number of channels in each head.
in_channels (int, optional) : The number of channels in the input and output (specify if the input is continuous).
patch_size (int, optional) : The size of the patch to use for the input.
activation_fn (str, optional, defaults to "geglu") : Activation function to use in feed-forward.
sample_size (int, optional) : The width of the latent images. This is fixed during training since it is used to learn a number of position embeddings.
dropout (float, optional, defaults to 0.0) : The dropout probability to use.
cross_attention_dim (int, optional) : The number of dimension in the clip text embedding.
hidden_size (int, optional) : The size of hidden layer in the conditioning embedding layers.
num_layers (int, optional, defaults to 1) : The number of layers of Transformer blocks to use.
mlp_ratio (float, optional, defaults to 4.0) : The ratio of the hidden layer size to the input size.
learn_sigma (bool, optional, defaults to True) : Whether to predict variance.
cross_attention_dim_t5 (int, optional) : The number dimensions in t5 text embedding.
pooled_projection_dim (int, optional) : The size of the pooled projection.
text_len (int, optional) : The length of the clip text embedding.
text_len_t5 (int, optional) : The length of the T5 text embedding.
use_style_cond_and_image_meta_size (bool, optional) : Whether or not to use style condition and image meta size. True for version = 1.2
forward[[diffusers.HunyuanDiT2DModel.forward]]
The HunyuanDiT2DModel forward method.
Parameters:
hidden_states (torch.Tensor of shape (batch size, dim, height, width)) : The input tensor.
timestep ( torch.LongTensor, optional) : Used to indicate denoising step.
encoder_hidden_states ( torch.Tensor of shape (batch size, sequence len, embed dims), optional) : Conditional embeddings for cross attention layer. This is the output of BertModel.
text_embedding_mask : torch.Tensor An attention mask of shape (batch, key_tokens) is applied to encoder_hidden_states. This is the output of BertModel.
encoder_hidden_states_t5 ( torch.Tensor of shape (batch size, sequence len, embed dims), optional) : Conditional embeddings for cross attention layer. This is the output of T5 Text Encoder.
text_embedding_mask_t5 : torch.Tensor An attention mask of shape (batch, key_tokens) is applied to encoder_hidden_states. This is the output of T5 Text Encoder.
image_meta_size (torch.Tensor) : Conditional embedding indicate the image sizes
style : torch.Tensor: Conditional embedding indicate the style
image_rotary_emb (torch.Tensor) : The image rotary embeddings to apply on query and key tensors during attention calculation.
return_dict : bool Whether to return a dictionary.
fuse_qkv_projections[[diffusers.HunyuanDiT2DModel.fuse_qkv_projections]]
Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value) are fused. For cross-attention modules, key and value projection matrices are fused.
> This API is 🧪 experimental.
set_default_attn_processor[[diffusers.HunyuanDiT2DModel.set_default_attn_processor]]
Disables custom attention processors and sets the default attention implementation.
unfuse_qkv_projections[[diffusers.HunyuanDiT2DModel.unfuse_qkv_projections]]
Disables the fused QKV projection if enabled.
> This API is 🧪 experimental.
Xet Storage Details
- Size:
- 5.63 kB
- Xet hash:
- 0d06925aa649a34ec483f82e9da27b8eee9fe1d20bf4b6da7e3bd736602014ac
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.