Buckets:
| # Ideogram 4 | |
| Ideogram 4 is a flow-matching text-to-image model that uses a multimodal text encoder and an asymmetric | |
| classifier-free guidance scheme: a dedicated `unconditional_transformer` produces the negative branch with zeroed text | |
| features, while the main `transformer` consumes the full packed text + image sequence. | |
| The pipeline defaults are the recommended settings for best quality, so a plain `pipe(prompt)` call produces | |
| best-quality results out of the box: 48 flow-matching steps on a logit-normal schedule (`mu=0.0`, `std=1.5`) with | |
| classifier-free guidance held at 7.0 for the main steps and dropped to 3.0 for the final 3 "polish" steps. | |
| Key inference-time knobs are exposed via the pipeline call: | |
| - `num_inference_steps`, `mu`, and `std` control the resolution-aware logit-normal flow-matching schedule. | |
| - `guidance_scale` (or a full per-step `guidance_schedule`) blends the conditional and unconditional velocities. | |
| ## Text-to-image | |
| ```python | |
| import torch | |
| from diffusers import Ideogram4Pipeline | |
| pipe = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16) | |
| pipe.to("cuda") | |
| prompt = "A photo of a cat holding a sign that says hello world" | |
| # The defaults are the recommended settings for best quality. | |
| image = pipe(prompt, height=1024, width=1024, generator=torch.Generator("cuda").manual_seed(0)).images[0] | |
| image.save("ideogram4.png") | |
| ``` | |
| ## Ideogram4Pipeline[[diffusers.Ideogram4Pipeline]] | |
| #### diffusers.Ideogram4Pipeline[[diffusers.Ideogram4Pipeline]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py#L133) | |
| Text-to-image pipeline for Ideogram4. | |
| Ideogram4 is a flow-matching model trained with asymmetric classifier-free guidance: a `transformer` consumes | |
| text-conditioned features alongside the image latents, while a separate `unconditional_transformer` denoises with | |
| zeroed text features. The two velocity predictions are linearly blended each step. | |
| __call__diffusers.Ideogram4Pipeline.__call__https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py#L407[{"name": "prompt", "val": ": str | list[str] | None = None"}, {"name": "height", "val": ": int = 2048"}, {"name": "width", "val": ": int = 2048"}, {"name": "num_inference_steps", "val": ": int = 48"}, {"name": "guidance_scale", "val": ": float | None = None"}, {"name": "guidance_schedule", "val": ": list[float] | torch.Tensor | None = (7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0)"}, {"name": "mu", "val": ": float = 0.0"}, {"name": "std", "val": ": float = 1.5"}, {"name": "max_sequence_length", "val": ": int = 2048"}, {"name": "num_images_per_prompt", "val": ": int = 1"}, {"name": "generator", "val": ": torch._C.Generator | list[torch._C.Generator] | None = None"}, {"name": "latents", "val": ": torch.Tensor | None = None"}, {"name": "output_type", "val": ": str = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "callback_on_step_end", "val": ": typing.Optional[typing.Callable[[ForwardRef('Ideogram4Pipeline'), int, int, dict[str, typing.Any]], dict[str, typing.Any]]] = None"}, {"name": "callback_on_step_end_tensor_inputs", "val": ": list = ['latents']"}]- **prompt** (`str` or `list[str]`) -- | |
| Prompt(s) to guide image generation. | |
| - **height** (`int`, *optional*, defaults to 2048) -- | |
| Output image height in pixels; must be a multiple of `vae_scale_factor * patch_size`. | |
| - **width** (`int`, *optional*, defaults to 2048) -- | |
| Output image width in pixels; must be a multiple of `vae_scale_factor * patch_size`. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 48) -- | |
| Number of flow-matching steps. The default is the recommended setting for best quality. | |
| - **guidance_scale** (`float`, *optional*) -- | |
| Constant classifier-free guidance scale applied at every step. The conditional and unconditional | |
| velocity predictions are blended as `v = guidance_scale * v_pos + (1 - guidance_scale) * v_neg`. | |
| Mutually exclusive with `guidance_schedule` (setting both raises). Defaults to `None`. | |
| - **guidance_schedule** (`list[float]` or `torch.Tensor`, *optional*) -- | |
| Per-step guidance scale schedule; must have length `num_inference_steps`. The first entry corresponds | |
| to the first step (largest noise level). Mutually exclusive with `guidance_scale`; exactly one must be | |
| set. Defaults to the recommended schedule (7.0 for the main steps, dropping to 3.0 for the final 3 | |
| "polish" steps). To use a constant scale instead, pass `guidance_scale` and `guidance_schedule=None`. | |
| - **mu** (`float`, *optional*, defaults to 0.0) -- | |
| Base mean of the logit-normal flow-matching schedule. The schedule mean is shifted by half the log of | |
| the resolution ratio relative to 512x512. | |
| - **std** (`float`, *optional*, defaults to 1.5) -- | |
| Standard deviation of the logit-normal flow-matching schedule. | |
| - **max_sequence_length** (`int`, *optional*, defaults to 2048) -- | |
| Maximum number of text tokens per prompt. | |
| - **num_images_per_prompt** (`int`, *optional*, defaults to 1) -- | |
| Number of images to generate per prompt. | |
| - **generator** (`torch.Generator` or `list[torch.Generator]`, *optional*) -- | |
| Generator(s) used to make sampling deterministic. | |
| - **latents** (`torch.Tensor`, *optional*) -- | |
| Pre-generated noise of shape `(batch_size, num_image_tokens, latent_dim)`. | |
| - **output_type** (`str`, *optional*, defaults to `"pil"`) -- | |
| One of `"pil"`, `"np"`, `"pt"`, or `"latent"`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to return an [Ideogram4PipelineOutput](/docs/diffusers/pr_13859/en/api/pipelines/ideogram4#diffusers.pipelines.ideogram4.Ideogram4PipelineOutput). | |
| - **callback_on_step_end** (`Callable`, *optional*) -- | |
| Callback invoked at the end of every denoising step. | |
| - **callback_on_step_end_tensor_inputs** (`list[str]`, *optional*) -- | |
| Names of tensors to expose to the callback via `callback_kwargs`.0[Ideogram4PipelineOutput](/docs/diffusers/pr_13859/en/api/pipelines/ideogram4#diffusers.pipelines.ideogram4.Ideogram4PipelineOutput) or `tuple`. | |
| Run text-to-image generation. | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import Ideogram4Pipeline | |
| >>> pipe = Ideogram4Pipeline.from_pretrained("ideogram-ai/ideogram-v4", torch_dtype=torch.bfloat16) | |
| >>> pipe.to("cuda") | |
| >>> prompt = "A photo of a cat holding a sign that says hello world" | |
| >>> # The defaults are the recommended settings for best quality. | |
| >>> image = pipe(prompt, height=2048, width=2048, generator=torch.Generator("cuda").manual_seed(0)).images[0] | |
| >>> image.save("ideogram4.png") | |
| ``` | |
| **Parameters:** | |
| scheduler ([FlowMatchEulerDiscreteScheduler](/docs/diffusers/pr_13859/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler)) : Flow-matching scheduler. The pipeline overrides the default sigma schedule with a resolution-aware logit-normal schedule. | |
| vae (`AutoencoderKLFlux2`) : Variational auto-encoder used to decode latents back into images. | |
| text_encoder (`PreTrainedModel`) : Multimodal text encoder. The pipeline consumes hidden states from a fixed set of intermediate decoder layers (see `QWEN3_VL_ACTIVATION_LAYERS`). | |
| tokenizer (`AutoTokenizer`) : Tokenizer paired with `text_encoder`. | |
| transformer ([Ideogram4Transformer2DModel](/docs/diffusers/pr_13859/en/api/models/ideogram4_transformer2d#diffusers.Ideogram4Transformer2DModel)) : Conditional flow-matching transformer. | |
| unconditional_transformer ([Ideogram4Transformer2DModel](/docs/diffusers/pr_13859/en/api/models/ideogram4_transformer2d#diffusers.Ideogram4Transformer2DModel)) : Unconditional (asymmetric-CFG) flow-matching transformer. | |
| **Returns:** | |
| [Ideogram4PipelineOutput](/docs/diffusers/pr_13859/en/api/pipelines/ideogram4#diffusers.pipelines.ideogram4.Ideogram4PipelineOutput) or `tuple`. | |
| #### encode_prompt[[diffusers.Ideogram4Pipeline.encode_prompt]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py#L273) | |
| Prepare the conditioning for the packed text+image sequence (one entry per prompt). | |
| Returns a flat tuple `(prompt_embeds, position_ids, segment_ids, indicator)`. The unconditional branch carries | |
| no text, so the pipeline builds its (zeroed) inputs directly rather than encoding a negative prompt. | |
| ## Ideogram4PipelineOutput[[diffusers.pipelines.ideogram4.Ideogram4PipelineOutput]] | |
| #### diffusers.pipelines.ideogram4.Ideogram4PipelineOutput[[diffusers.pipelines.ideogram4.Ideogram4PipelineOutput]] | |
| [Source](https://github.com/huggingface/diffusers/blob/vr_13859/src/diffusers/pipelines/ideogram4/pipeline_output.py#L24) | |
| Output class for the Ideogram 4 pipeline. | |
| **Parameters:** | |
| images (`list[PIL.Image.Image]` or `np.ndarray`) : List of denoised PIL images of length `batch_size`, or numpy array of shape `(batch_size, height, width, num_channels)`. | |
Xet Storage Details
- Size:
- 9.05 kB
- Xet hash:
- 5473b22487dcbc07f67d0304d6440111660bf62f0786878ccb1c9c19b23f4a2f
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.