Buckets:
LuminaNextDiT2DModel
A Next Version of Diffusion Transformer model for 2D data from Lumina-T2X.
LuminaNextDiT2DModel[[diffusers.LuminaNextDiT2DModel]]
class diffusers.LuminaNextDiT2DModeldiffusers.LuminaNextDiT2DModelint) -- The width of the latent images. This is fixed during training since
it is used to learn a number of position embeddings.
- patch_size (
int, optional, (int, optional, defaults to 2) -- The size of each patch in the image. This parameter defines the resolution of patches fed into the model. - in_channels (
int, optional, defaults to 4) -- The number of input channels for the model. Typically, this matches the number of channels in the input images. - hidden_size (
int, optional, defaults to 4096) -- The dimensionality of the hidden layers in the model. This parameter determines the width of the model's hidden representations. - num_layers (
int, optional, default to 32) -- The number of layers in the model. This defines the depth of the neural network. - num_attention_heads (
int, optional, defaults to 32) -- The number of attention heads in each attention layer. This parameter specifies how many separate attention mechanisms are used. - num_kv_heads (
int, optional, defaults to 8) -- The number of key-value heads in the attention mechanism, if different from the number of attention heads. If None, it defaults to num_attention_heads. - multiple_of (
int, optional, defaults to 256) -- A factor that the hidden size should be a multiple of. This can help optimize certain hardware configurations. - ffn_dim_multiplier (
float, optional) -- A multiplier for the dimensionality of the feed-forward network. If None, it uses a default value based on the model configuration. - norm_eps (
float, optional, defaults to 1e-5) -- A small value added to the denominator for numerical stability in normalization layers. - learn_sigma (
bool, optional, defaults to True) -- Whether the model should learn the sigma parameter, which might be related to uncertainty or variance in predictions. - qk_norm (
bool, optional, defaults to True) -- Indicates if the queries and keys in the attention mechanism should be normalized. - cross_attention_dim (
int, optional, defaults to 2048) -- The dimensionality of the text embeddings. This parameter defines the size of the text representations used in the model. - scaling_factor (
float, optional, defaults to 1.0) -- A scaling factor applied to certain parameters or layers in the model. This can be used for adjusting the overall scale of the model's operations.0
LuminaNextDiT: Diffusion model with a Transformer backbone.
Inherit ModelMixin and ConfigMixin to be compatible with the sampler StableDiffusionPipeline of diffusers.
forwarddiffusers.LuminaNextDiT2DModel.forward
- timestep (torch.Tensor) -- Tensor of diffusion timesteps of shape (N,).
- encoder_hidden_states (torch.Tensor) -- Tensor of caption features of shape (N, D).
- encoder_mask (torch.Tensor) -- Tensor of caption masks of shape (N, L).0
Forward pass of LuminaNextDiT.
Xet Storage Details
- Size:
- 5.46 kB
- Xet hash:
- 733fda46c0628ea8dde2da15365b0faec34c42e65f22af721785e2ad7dc4797a
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.