Buckets:

hf-doc-build
/

doc

Files

xet

hf-doc-build/doc / diffusers /v0.36.0 /en /api /models /flux2_transformer.md

rtrm

about 2 months ago

preview code

download

raw

3.89 kB

	# Flux2Transformer2DModel

	A Transformer model for image-like data from [Flux2](https://hf.co/black-forest-labs/FLUX.2-dev).

	## Flux2Transformer2DModel[[diffusers.Flux2Transformer2DModel]]

	#### diffusers.Flux2Transformer2DModel[[diffusers.Flux2Transformer2DModel]]

	[Source](https://github.com/huggingface/diffusers/blob/v0.36.0/src/diffusers/models/transformers/transformer_flux2.py#L631)

	The Transformer model introduced in Flux 2.

	Reference: https://blackforestlabs.ai/announcing-black-forest-labs/

	forwarddiffusers.Flux2Transformer2DModel.forwardhttps://github.com/huggingface/diffusers/blob/v0.36.0/src/diffusers/models/transformers/transformer_flux2.py#L763[{"name": "hidden_states", "val": ": Tensor"}, {"name": "encoder_hidden_states", "val": ": Tensor = None"}, {"name": "timestep", "val": ": LongTensor = None"}, {"name": "img_ids", "val": ": Tensor = None"}, {"name": "txt_ids", "val": ": Tensor = None"}, {"name": "guidance", "val": ": Tensor = None"}, {"name": "joint_attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "return_dict", "val": ": bool = True"}]- hidden_states (`torch.Tensor` of shape `(batch_size, image_sequence_length, in_channels)`) --
	Input `hidden_states`.
	- encoder_hidden_states (`torch.Tensor` of shape `(batch_size, text_sequence_length, joint_attention_dim)`) --
	Conditional embeddings (embeddings computed from the input conditions such as prompts) to use.
	- timestep ( `torch.LongTensor`) --
	Used to indicate denoising step.
	- block_controlnet_hidden_states -- (`list` of `torch.Tensor`):
	A list of tensors that if specified are added to the residuals of transformer blocks.
	- joint_attention_kwargs (`dict`, optional) --
	A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
	`self.processor` in
	[diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
	- return_dict (`bool`, optional, defaults to `True`) --
	Whether or not to return a `~models.transformer_2d.Transformer2DModelOutput` instead of a plain
	tuple.0If `return_dict` is True, an `~models.transformer_2d.Transformer2DModelOutput` is returned, otherwise a
	`tuple` where the first element is the sample tensor.

	The [FluxTransformer2DModel](/docs/diffusers/v0.36.0/en/api/models/flux_transformer#diffusers.FluxTransformer2DModel) forward method.

	Parameters:

	patch_size (`int`, defaults to `1`) : Patch size to turn the input data into small patches.

	in_channels (`int`, defaults to `128`) : The number of channels in the input.

	out_channels (`int`, optional, defaults to `None`) : The number of channels in the output. If not specified, it defaults to `in_channels`.

	num_layers (`int`, defaults to `8`) : The number of layers of dual stream DiT blocks to use.

	num_single_layers (`int`, defaults to `48`) : The number of layers of single stream DiT blocks to use.

	attention_head_dim (`int`, defaults to `128`) : The number of dimensions to use for each attention head.

	num_attention_heads (`int`, defaults to `48`) : The number of attention heads to use.

	joint_attention_dim (`int`, defaults to `15360`) : The number of dimensions to use for the joint attention (embedding/channel dimension of `encoder_hidden_states`).

	pooled_projection_dim (`int`, defaults to `768`) : The number of dimensions to use for the pooled projection.

	guidance_embeds (`bool`, defaults to `True`) : Whether to use guidance embeddings for guidance-distilled variant of the model.

	axes_dims_rope (`Tuple[int]`, defaults to `(32, 32, 32, 32)`) : The dimensions to use for the rotary positional embeddings.

	Returns:

	If `return_dict` is True, an `~models.transformer_2d.Transformer2DModelOutput` is returned, otherwise a
	`tuple` where the first element is the sample tensor.

Xet Storage Details

Size:: 3.89 kB
Xet hash:: 13f110032b66422ef9e9269a21723fb9e40dcf5fe1c6ce2141c84b459cd58ab5

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.