Buckets:
| # LuminaNextDiT2DModel | |
| A Next Version of Diffusion Transformer model for 2D data from [Lumina-T2X](https://github.com/Alpha-VLLM/Lumina-T2X). | |
| ## LuminaNextDiT2DModel[[diffusers.LuminaNextDiT2DModel]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.LuminaNextDiT2DModel</name><anchor>diffusers.LuminaNextDiT2DModel</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/transformers/lumina_nextdit2d.py#L178</source><parameters>[{"name": "sample_size", "val": ": int = 128"}, {"name": "patch_size", "val": ": typing.Optional[int] = 2"}, {"name": "in_channels", "val": ": typing.Optional[int] = 4"}, {"name": "hidden_size", "val": ": typing.Optional[int] = 2304"}, {"name": "num_layers", "val": ": typing.Optional[int] = 32"}, {"name": "num_attention_heads", "val": ": typing.Optional[int] = 32"}, {"name": "num_kv_heads", "val": ": typing.Optional[int] = None"}, {"name": "multiple_of", "val": ": typing.Optional[int] = 256"}, {"name": "ffn_dim_multiplier", "val": ": typing.Optional[float] = None"}, {"name": "norm_eps", "val": ": typing.Optional[float] = 1e-05"}, {"name": "learn_sigma", "val": ": typing.Optional[bool] = True"}, {"name": "qk_norm", "val": ": typing.Optional[bool] = True"}, {"name": "cross_attention_dim", "val": ": typing.Optional[int] = 2048"}, {"name": "scaling_factor", "val": ": typing.Optional[float] = 1.0"}]</parameters><paramsdesc>- **sample_size** (`int`) -- The width of the latent images. This is fixed during training since | |
| it is used to learn a number of position embeddings. | |
| - **patch_size** (`int`, *optional*, (`int`, *optional*, defaults to 2) -- | |
| The size of each patch in the image. This parameter defines the resolution of patches fed into the model. | |
| - **in_channels** (`int`, *optional*, defaults to 4) -- | |
| The number of input channels for the model. Typically, this matches the number of channels in the input | |
| images. | |
| - **hidden_size** (`int`, *optional*, defaults to 4096) -- | |
| The dimensionality of the hidden layers in the model. This parameter determines the width of the model's | |
| hidden representations. | |
| - **num_layers** (`int`, *optional*, default to 32) -- | |
| The number of layers in the model. This defines the depth of the neural network. | |
| - **num_attention_heads** (`int`, *optional*, defaults to 32) -- | |
| The number of attention heads in each attention layer. This parameter specifies how many separate attention | |
| mechanisms are used. | |
| - **num_kv_heads** (`int`, *optional*, defaults to 8) -- | |
| The number of key-value heads in the attention mechanism, if different from the number of attention heads. | |
| If None, it defaults to num_attention_heads. | |
| - **multiple_of** (`int`, *optional*, defaults to 256) -- | |
| A factor that the hidden size should be a multiple of. This can help optimize certain hardware | |
| configurations. | |
| - **ffn_dim_multiplier** (`float`, *optional*) -- | |
| A multiplier for the dimensionality of the feed-forward network. If None, it uses a default value based on | |
| the model configuration. | |
| - **norm_eps** (`float`, *optional*, defaults to 1e-5) -- | |
| A small value added to the denominator for numerical stability in normalization layers. | |
| - **learn_sigma** (`bool`, *optional*, defaults to True) -- | |
| Whether the model should learn the sigma parameter, which might be related to uncertainty or variance in | |
| predictions. | |
| - **qk_norm** (`bool`, *optional*, defaults to True) -- | |
| Indicates if the queries and keys in the attention mechanism should be normalized. | |
| - **cross_attention_dim** (`int`, *optional*, defaults to 2048) -- | |
| The dimensionality of the text embeddings. This parameter defines the size of the text representations used | |
| in the model. | |
| - **scaling_factor** (`float`, *optional*, defaults to 1.0) -- | |
| A scaling factor applied to certain parameters or layers in the model. This can be used for adjusting the | |
| overall scale of the model's operations.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| LuminaNextDiT: Diffusion model with a Transformer backbone. | |
| Inherit ModelMixin and ConfigMixin to be compatible with the sampler StableDiffusionPipeline of diffusers. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>diffusers.LuminaNextDiT2DModel.forward</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/transformers/lumina_nextdit2d.py#L291</source><parameters>[{"name": "hidden_states", "val": ": Tensor"}, {"name": "timestep", "val": ": Tensor"}, {"name": "encoder_hidden_states", "val": ": Tensor"}, {"name": "encoder_mask", "val": ": Tensor"}, {"name": "image_rotary_emb", "val": ": Tensor"}, {"name": "cross_attention_kwargs", "val": ": typing.Dict[str, typing.Any] = None"}, {"name": "return_dict", "val": " = True"}]</parameters><paramsdesc>- **hidden_states** (torch.Tensor) -- Input tensor of shape (N, C, H, W). | |
| - **timestep** (torch.Tensor) -- Tensor of diffusion timesteps of shape (N,). | |
| - **encoder_hidden_states** (torch.Tensor) -- Tensor of caption features of shape (N, D). | |
| - **encoder_mask** (torch.Tensor) -- Tensor of caption masks of shape (N, L).</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Forward pass of LuminaNextDiT. | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/lumina_nextdit2d.md" /> |
Xet Storage Details
- Size:
- 5.46 kB
- Xet hash:
- 733fda46c0628ea8dde2da15365b0faec34c42e65f22af721785e2ad7dc4797a
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.