Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / diffusers /pr_13859 /en /api /pipelines /ideogram4.md

HuggingFaceDocBuilder

about 1 month ago

preview code

download

raw

9.05 kB

	# Ideogram 4

	Ideogram 4 is a flow-matching text-to-image model that uses a multimodal text encoder and an asymmetric
	classifier-free guidance scheme: a dedicated `unconditional_transformer` produces the negative branch with zeroed text
	features, while the main `transformer` consumes the full packed text + image sequence.

	The pipeline defaults are the recommended settings for best quality, so a plain `pipe(prompt)` call produces
	best-quality results out of the box: 48 flow-matching steps on a logit-normal schedule (`mu=0.0`, `std=1.5`) with
	classifier-free guidance held at 7.0 for the main steps and dropped to 3.0 for the final 3 "polish" steps.

	Key inference-time knobs are exposed via the pipeline call:

	- `num_inference_steps`, `mu`, and `std` control the resolution-aware logit-normal flow-matching schedule.
	- `guidance_scale` (or a full per-step `guidance_schedule`) blends the conditional and unconditional velocities.

	## Text-to-image

	```python
	import torch
	from diffusers import Ideogram4Pipeline

	pipe = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16)
	pipe.to("cuda")

	prompt = "A photo of a cat holding a sign that says hello world"
	# The defaults are the recommended settings for best quality.
	image = pipe(prompt, height=1024, width=1024, generator=torch.Generator("cuda").manual_seed(0)).images[0]
	image.save("ideogram4.png")
	```

	## Ideogram4Pipeline[[diffusers.Ideogram4Pipeline]]

	#### diffusers.Ideogram4Pipeline[[diffusers.Ideogram4Pipeline]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py#L133)

	Text-to-image pipeline for Ideogram4.

	Ideogram4 is a flow-matching model trained with asymmetric classifier-free guidance: a `transformer` consumes
	text-conditioned features alongside the image latents, while a separate `unconditional_transformer` denoises with
	zeroed text features. The two velocity predictions are linearly blended each step.

	__call__diffusers.Ideogram4Pipeline.__call__https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py#L407[{"name": "prompt", "val": ": str \| list[str] \| None = None"}, {"name": "height", "val": ": int = 2048"}, {"name": "width", "val": ": int = 2048"}, {"name": "num_inference_steps", "val": ": int = 48"}, {"name": "guidance_scale", "val": ": float \| None = None"}, {"name": "guidance_schedule", "val": ": list[float] \| torch.Tensor \| None = (7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0)"}, {"name": "mu", "val": ": float = 0.0"}, {"name": "std", "val": ": float = 1.5"}, {"name": "max_sequence_length", "val": ": int = 2048"}, {"name": "num_images_per_prompt", "val": ": int = 1"}, {"name": "generator", "val": ": torch._C.Generator \| list[torch._C.Generator] \| None = None"}, {"name": "latents", "val": ": torch.Tensor \| None = None"}, {"name": "output_type", "val": ": str = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback_on_step_end", "val": ": typing.Optional[typing.Callable[[ForwardRef('Ideogram4Pipeline'), int, int, dict[str, typing.Any]], dict[str, typing.Any]]] = None"}, {"name": "callback_on_step_end_tensor_inputs", "val": ": list = ['latents']"}]- prompt (`str` or `list[str]`) --
	Prompt(s) to guide image generation.
	- height (`int`, optional, defaults to 2048) --
	Output image height in pixels; must be a multiple of `vae_scale_factor * patch_size`.
	- width (`int`, optional, defaults to 2048) --
	Output image width in pixels; must be a multiple of `vae_scale_factor * patch_size`.
	- num_inference_steps (`int`, optional, defaults to 48) --
	Number of flow-matching steps. The default is the recommended setting for best quality.
	- guidance_scale (`float`, optional) --
	Constant classifier-free guidance scale applied at every step. The conditional and unconditional
	velocity predictions are blended as `v = guidance_scale * v_pos + (1 - guidance_scale) * v_neg`.
	Mutually exclusive with `guidance_schedule` (setting both raises). Defaults to `None`.
	- guidance_schedule (`list[float]` or `torch.Tensor`, optional) --
	Per-step guidance scale schedule; must have length `num_inference_steps`. The first entry corresponds
	to the first step (largest noise level). Mutually exclusive with `guidance_scale`; exactly one must be
	set. Defaults to the recommended schedule (7.0 for the main steps, dropping to 3.0 for the final 3
	"polish" steps). To use a constant scale instead, pass `guidance_scale` and `guidance_schedule=None`.
	- mu (`float`, optional, defaults to 0.0) --
	Base mean of the logit-normal flow-matching schedule. The schedule mean is shifted by half the log of
	the resolution ratio relative to 512x512.
	- std (`float`, optional, defaults to 1.5) --
	Standard deviation of the logit-normal flow-matching schedule.
	- max_sequence_length (`int`, optional, defaults to 2048) --
	Maximum number of text tokens per prompt.
	- num_images_per_prompt (`int`, optional, defaults to 1) --
	Number of images to generate per prompt.
	- generator (`torch.Generator` or `list[torch.Generator]`, optional) --
	Generator(s) used to make sampling deterministic.
	- latents (`torch.Tensor`, optional) --
	Pre-generated noise of shape `(batch_size, num_image_tokens, latent_dim)`.
	- output_type (`str`, optional, defaults to `"pil"`) --
	One of `"pil"`, `"np"`, `"pt"`, or `"latent"`.
	- return_dict (`bool`, optional, defaults to `True`) --
	Whether to return an [Ideogram4PipelineOutput](/docs/diffusers/pr_13859/en/api/pipelines/ideogram4#diffusers.pipelines.ideogram4.Ideogram4PipelineOutput).
	- callback_on_step_end (`Callable`, optional) --
	Callback invoked at the end of every denoising step.
	- callback_on_step_end_tensor_inputs (`list[str]`, optional) --
	Names of tensors to expose to the callback via `callback_kwargs`.0[Ideogram4PipelineOutput](/docs/diffusers/pr_13859/en/api/pipelines/ideogram4#diffusers.pipelines.ideogram4.Ideogram4PipelineOutput) or `tuple`.

	Run text-to-image generation.

	Examples:
	```py
	>>> import torch
	>>> from diffusers import Ideogram4Pipeline

	>>> pipe = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16)
	>>> pipe.to("cuda")

	>>> prompt = "A photo of a cat holding a sign that says hello world"
	>>> # The defaults are the recommended settings for best quality.
	>>> image = pipe(prompt, height=2048, width=2048, generator=torch.Generator("cuda").manual_seed(0)).images[0]
	>>> image.save("ideogram4.png")
	```

	Parameters:

	scheduler ([FlowMatchEulerDiscreteScheduler](/docs/diffusers/pr_13859/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler)) : Flow-matching scheduler. The pipeline overrides the default sigma schedule with a resolution-aware logit-normal schedule.

	vae (`AutoencoderKLFlux2`) : Variational auto-encoder used to decode latents back into images.

	text_encoder (`PreTrainedModel`) : Multimodal text encoder. The pipeline consumes hidden states from a fixed set of intermediate decoder layers (see `QWEN3_VL_ACTIVATION_LAYERS`).

	tokenizer (`AutoTokenizer`) : Tokenizer paired with `text_encoder`.

	transformer ([Ideogram4Transformer2DModel](/docs/diffusers/pr_13859/en/api/models/ideogram4_transformer2d#diffusers.Ideogram4Transformer2DModel)) : Conditional flow-matching transformer.

	unconditional_transformer ([Ideogram4Transformer2DModel](/docs/diffusers/pr_13859/en/api/models/ideogram4_transformer2d#diffusers.Ideogram4Transformer2DModel)) : Unconditional (asymmetric-CFG) flow-matching transformer.

	Returns:

	[Ideogram4PipelineOutput](/docs/diffusers/pr_13859/en/api/pipelines/ideogram4#diffusers.pipelines.ideogram4.Ideogram4PipelineOutput) or `tuple`.
	#### encode_prompt[[diffusers.Ideogram4Pipeline.encode_prompt]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py#L273)

	Prepare the conditioning for the packed text+image sequence (one entry per prompt).

	Returns a flat tuple `(prompt_embeds, position_ids, segment_ids, indicator)`. The unconditional branch carries
	no text, so the pipeline builds its (zeroed) inputs directly rather than encoding a negative prompt.

	## Ideogram4PipelineOutput[[diffusers.pipelines.ideogram4.Ideogram4PipelineOutput]]

	#### diffusers.pipelines.ideogram4.Ideogram4PipelineOutput[[diffusers.pipelines.ideogram4.Ideogram4PipelineOutput]]

	[Source](https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_output.py#L24)

	Output class for the Ideogram 4 pipeline.

	Parameters:

	images (`list[PIL.Image.Image]` or `np.ndarray`) : List of denoised PIL images of length `batch_size`, or numpy array of shape `(batch_size, height, width, num_channels)`.

Xet Storage Details

Size:: 9.05 kB
Xet hash:: 5473b22487dcbc07f67d0304d6440111660bf62f0786878ccb1c9c19b23f4a2f

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.