Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_13863 /en /api /pipelines /ideogram4.md

HuggingFaceDocBuilder

about 1 month ago

preview code

download

raw

9.05 kB

Ideogram 4

Ideogram 4 is a flow-matching text-to-image model that uses a multimodal text encoder and an asymmetric classifier-free guidance scheme: a dedicated unconditional_transformer produces the negative branch with zeroed text features, while the main transformer consumes the full packed text + image sequence.

The pipeline defaults are the recommended settings for best quality, so a plain pipe(prompt) call produces best-quality results out of the box: 48 flow-matching steps on a logit-normal schedule (mu=0.0, std=1.5) with classifier-free guidance held at 7.0 for the main steps and dropped to 3.0 for the final 3 "polish" steps.

Key inference-time knobs are exposed via the pipeline call:

num_inference_steps, mu, and std control the resolution-aware logit-normal flow-matching schedule.
guidance_scale (or a full per-step guidance_schedule) blends the conditional and unconditional velocities.

Text-to-image

import torch
from diffusers import Ideogram4Pipeline

pipe = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A photo of a cat holding a sign that says hello world"
# The defaults are the recommended settings for best quality.
image = pipe(prompt, height=1024, width=1024, generator=torch.Generator("cuda").manual_seed(0)).images[0]
image.save("ideogram4.png")

Ideogram4Pipeline[[diffusers.Ideogram4Pipeline]]

diffusers.Ideogram4Pipeline[[diffusers.Ideogram4Pipeline]]

Source

Text-to-image pipeline for Ideogram4.

Ideogram4 is a flow-matching model trained with asymmetric classifier-free guidance: a transformer consumes text-conditioned features alongside the image latents, while a separate unconditional_transformer denoises with zeroed text features. The two velocity predictions are linearly blended each step.

__call__diffusers.Ideogram4Pipeline.__call__https://github.com/huggingface/diffusers/blob/vr_13863/src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py#L407[{"name": "prompt", "val": ": str | list[str] | None = None"}, {"name": "height", "val": ": int = 2048"}, {"name": "width", "val": ": int = 2048"}, {"name": "num_inference_steps", "val": ": int = 48"}, {"name": "guidance_scale", "val": ": float | None = None"}, {"name": "guidance_schedule", "val": ": list[float] | torch.Tensor | None = (7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0)"}, {"name": "mu", "val": ": float = 0.0"}, {"name": "std", "val": ": float = 1.5"}, {"name": "max_sequence_length", "val": ": int = 2048"}, {"name": "num_images_per_prompt", "val": ": int = 1"}, {"name": "generator", "val": ": torch._C.Generator | list[torch._C.Generator] | None = None"}, {"name": "latents", "val": ": torch.Tensor | None = None"}, {"name": "output_type", "val": ": str = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback_on_step_end", "val": ": typing.Optional[typing.Callable[[ForwardRef('Ideogram4Pipeline'), int, int, dict[str, typing.Any]], dict[str, typing.Any]]] = None"}, {"name": "callback_on_step_end_tensor_inputs", "val": ": list = ['latents']"}]- prompt (str or list[str]) -- Prompt(s) to guide image generation.

height (int, optional, defaults to 2048) -- Output image height in pixels; must be a multiple of vae_scale_factor * patch_size.
width (int, optional, defaults to 2048) -- Output image width in pixels; must be a multiple of vae_scale_factor * patch_size.
num_inference_steps (int, optional, defaults to 48) -- Number of flow-matching steps. The default is the recommended setting for best quality.
guidance_scale (float, optional) -- Constant classifier-free guidance scale applied at every step. The conditional and unconditional velocity predictions are blended as v = guidance_scale * v_pos + (1 - guidance_scale) * v_neg. Mutually exclusive with guidance_schedule (setting both raises). Defaults to None.
guidance_schedule (list[float] or torch.Tensor, optional) -- Per-step guidance scale schedule; must have length num_inference_steps. The first entry corresponds to the first step (largest noise level). Mutually exclusive with guidance_scale; exactly one must be set. Defaults to the recommended schedule (7.0 for the main steps, dropping to 3.0 for the final 3 "polish" steps). To use a constant scale instead, pass guidance_scale and guidance_schedule=None.
mu (float, optional, defaults to 0.0) -- Base mean of the logit-normal flow-matching schedule. The schedule mean is shifted by half the log of the resolution ratio relative to 512x512.
std (float, optional, defaults to 1.5) -- Standard deviation of the logit-normal flow-matching schedule.
max_sequence_length (int, optional, defaults to 2048) -- Maximum number of text tokens per prompt.
num_images_per_prompt (int, optional, defaults to 1) -- Number of images to generate per prompt.
generator (torch.Generator or list[torch.Generator], optional) -- Generator(s) used to make sampling deterministic.
latents (torch.Tensor, optional) -- Pre-generated noise of shape (batch_size, num_image_tokens, latent_dim).
output_type (str, optional, defaults to "pil") -- One of "pil", "np", "pt", or "latent".
return_dict (bool, optional, defaults to True) -- Whether to return an Ideogram4PipelineOutput.
callback_on_step_end (Callable, optional) -- Callback invoked at the end of every denoising step.
callback_on_step_end_tensor_inputs (list[str], optional) -- Names of tensors to expose to the callback via callback_kwargs.0Ideogram4PipelineOutput or tuple.

Run text-to-image generation.

Examples:

>>> import torch
>>> from diffusers import Ideogram4Pipeline

>>> pipe = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16)
>>> pipe.to("cuda")

>>> prompt = "A photo of a cat holding a sign that says hello world"
>>> # The defaults are the recommended settings for best quality.
>>> image = pipe(prompt, height=2048, width=2048, generator=torch.Generator("cuda").manual_seed(0)).images[0]
>>> image.save("ideogram4.png")

Parameters:

scheduler (FlowMatchEulerDiscreteScheduler) : Flow-matching scheduler. The pipeline overrides the default sigma schedule with a resolution-aware logit-normal schedule.

vae (AutoencoderKLFlux2) : Variational auto-encoder used to decode latents back into images.

text_encoder (PreTrainedModel) : Multimodal text encoder. The pipeline consumes hidden states from a fixed set of intermediate decoder layers (see QWEN3_VL_ACTIVATION_LAYERS).

tokenizer (AutoTokenizer) : Tokenizer paired with text_encoder.

transformer (Ideogram4Transformer2DModel) : Conditional flow-matching transformer.

unconditional_transformer (Ideogram4Transformer2DModel) : Unconditional (asymmetric-CFG) flow-matching transformer.

Returns:

Ideogram4PipelineOutput or tuple.

encode_prompt[[diffusers.Ideogram4Pipeline.encode_prompt]]

Source

Prepare the conditioning for the packed text+image sequence (one entry per prompt).

Returns a flat tuple (prompt_embeds, position_ids, segment_ids, indicator). The unconditional branch carries no text, so the pipeline builds its (zeroed) inputs directly rather than encoding a negative prompt.

Ideogram4PipelineOutput[[diffusers.pipelines.ideogram4.Ideogram4PipelineOutput]]

diffusers.pipelines.ideogram4.Ideogram4PipelineOutput[[diffusers.pipelines.ideogram4.Ideogram4PipelineOutput]]

Source

Output class for the Ideogram 4 pipeline.

Parameters:

images (list[PIL.Image.Image] or np.ndarray) : List of denoised PIL images of length batch_size, or numpy array of shape (batch_size, height, width, num_channels).

Xet Storage Details

Size:: 9.05 kB
Xet hash:: 4f0799f5b2d3bae6a9f189077df19356b26210f9d953d939d8da95e3d2663595

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.