Buckets:

hf-doc-build/doc-dev / diffusers /pr_12595 /en /api /models /hidream_image_transformer.md
rtrm's picture
|
download
raw
3.63 kB

HiDreamImageTransformer2DModel

A Transformer model for image-like data from HiDream-I1.

The model can be loaded with the following code snippet.

from diffusers import HiDreamImageTransformer2DModel

transformer = HiDreamImageTransformer2DModel.from_pretrained("HiDream-ai/HiDream-I1-Full", subfolder="transformer", torch_dtype=torch.bfloat16)

Loading GGUF quantized checkpoints for HiDream-I1

GGUF checkpoints for the HiDreamImageTransformer2DModel can be loaded using ~FromOriginalModelMixin.from_single_file

import torch
from diffusers import GGUFQuantizationConfig, HiDreamImageTransformer2DModel

ckpt_path = "https://huggingface.co/city96/HiDream-I1-Dev-gguf/blob/main/hidream-i1-dev-Q2_K.gguf"
transformer = HiDreamImageTransformer2DModel.from_single_file(
    ckpt_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16
)

HiDreamImageTransformer2DModel[[diffusers.HiDreamImageTransformer2DModel]]

class diffusers.HiDreamImageTransformer2DModeldiffusers.HiDreamImageTransformer2DModelhttps://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/transformers/transformer_hidream_image.py#L605[{"name": "patch_size", "val": ": typing.Optional[int] = None"}, {"name": "in_channels", "val": ": int = 64"}, {"name": "out_channels", "val": ": typing.Optional[int] = None"}, {"name": "num_layers", "val": ": int = 16"}, {"name": "num_single_layers", "val": ": int = 32"}, {"name": "attention_head_dim", "val": ": int = 128"}, {"name": "num_attention_heads", "val": ": int = 20"}, {"name": "caption_channels", "val": ": typing.List[int] = None"}, {"name": "text_emb_dim", "val": ": int = 2048"}, {"name": "num_routed_experts", "val": ": int = 4"}, {"name": "num_activated_experts", "val": ": int = 2"}, {"name": "axes_dims_rope", "val": ": typing.Tuple[int, int] = (32, 32)"}, {"name": "max_resolution", "val": ": typing.Tuple[int, int] = (128, 128)"}, {"name": "llama_layers", "val": ": typing.List[int] = None"}, {"name": "force_inference_output", "val": ": bool = False"}]

Transformer2DModelOutput[[diffusers.models.modeling_outputs.Transformer2DModelOutput]]

class diffusers.models.modeling_outputs.Transformer2DModelOutputdiffusers.models.modeling_outputs.Transformer2DModelOutputhttps://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/modeling_outputs.py#L21[{"name": "sample", "val": ": torch.Tensor"}]- sample (torch.Tensor of shape (batch_size, num_channels, height, width) or (batch size, num_vector_embeds - 1, num_latent_pixels) if Transformer2DModel is discrete) -- The hidden states output conditioned on the encoder_hidden_states input. If discrete, returns probability distributions for the unnoised latent pixels.0

The output of Transformer2DModel.

Xet Storage Details

Size:
3.63 kB
·
Xet hash:
1991516c3323d3799b4c7111f74f820a9b0e7e3c26918c98d42257f0a577dc52

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.