--- license: other license_name: ideogram-4-non-commercial license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md pipeline_tag: text-to-image tags: - text-to-image - image-generation - diffusion - flow-matching - dit - ideogram ---

Ideogram 4: Open image model at the forefront of design

A collage of Ideogram 4 samples spanning photorealism, illustration, typography, and poster design

Ideogram 4 is **[Ideogram](https://ideogram.ai)'s first open-source text-to-image model**. It is a **state-of-the-art foundation model trained from scratch** — not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at **[ideogram.ai](https://ideogram.ai/)**. We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence. ## Table of Contents 1. [News](#news) 2. [Model Zoo](#model-zoo) 3. [Performance](#performance) 4. [Quick Start](#quick-start) 5. [Model Summary](#model-summary) 6. [Prompting Guide](#prompting-guide) 7. [Documentation](#documentation) 8. [Citation](#citation) ## News * **[2026-06-03]** **Ideogram 4 released!** Inference code and weights are now public, and our [technical blog post](https://ideogram.ai/blog/ideogram-4.0/) is live. See the [Quick Start](#quick-start) section to generate your first image, or try the model online at [ideogram.ai](https://ideogram.ai/). ## Model Zoo | Model | Params | Weight Quantization | Supported Hardware | Diffusers Support | License | | :--- | :---: | :---: | :---: | :---: | :---: | | **[Ideogram 4 (nf4)](https://huggingface.co/ideogram-ai/ideogram-4-nf4)** | 9.3B | nf4 | CUDA | Yes | [Ideogram 4 Non-Commercial](https://huggingface.co/ideogram-ai/ideogram-4-nf4/blob/main/LICENSE.md) | | **[Ideogram 4 (fp8)](https://huggingface.co/ideogram-ai/ideogram-4-fp8)** | 9.3B | fp8 | All | No | [Ideogram 4 Non-Commercial](https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md) | We plan to support more quantizations in the future. ## Performance We evaluate Ideogram 4 across third-party arenas and benchmarks, standard open-source benchmarks, and our own internal human-preference benchmark. Across all of them, **Ideogram 4 is the best open-weight image model by far, and sits at the frontier of design.** ### Design Arena [Design Arena](https://www.designarena.ai/) is a third-party image Elo leaderboard focused specifically on design-oriented generation. On the overall board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary GPT and Gemini models:

Design Arena overall image Elo leaderboard with Ideogram 4.0 as the top open-weight model

Filtered to open-weight models only, Ideogram 4 leads by a commanding margin, well ahead of the next-best open model:

Design Arena open-weight image Elo leaderboard, with Ideogram 4.0 well ahead of all other open models

### ContraLabs [ContraLabs](https://contralabs.com/research) ran a blind typography evaluation judged by ten professional designers from Contra's top-earning talent. Ideogram 4 leads on first-place win rate, picked as the best of four models 47.9% of the time overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%, FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%):

ContraLabs typography first-place win rate, with Ideogram v4 leading

It also wins on practical usability: asked "Would you use this in real client work?", the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49):

ContraLabs 'would you use this in real client work?' rating, with Ideogram v4 leading

### LMArena On [LMArena](https://lmarena.ai/), a third-party text-to-image leaderboard that measures general-purpose text-to-image use cases, Ideogram is the top-ranked open-weight lab and a top-5 image generation lab overall — beaten only by giant companies with vastly larger budgets and resources:

LMArena text-to-image lab leaderboard with Ideogram

### Ideogram internal eval For our internal human-preference benchmark, focused on graphic design and photography, we had graphic designers deeply familiar with professional design work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall — behind only GPT Image 2 medium — and the top open-weight model:

Ideogram internal design leaderboard with Ideogram 4.0

### Open-source benchmarks On standard open-source benchmarks measuring core capabilities — layout control (7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering (X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the leading closed-source models across every axis. On layout control (7Bench), it is significantly better than all closed-source models:

Five-axis capability radar comparing Ideogram 4.0 to leading closed-source models on layout control, spatial reasoning, object fidelity, prompt alignment, and text rendering

At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight release we benchmarked — ahead of much larger models like Qwen-Image (20B), FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE):

Parameter-efficiency scatter plot showing Ideogram 4.0 at 9.3B parameters leading all other open-weight models on text rendering

## Quick Start ### Install The inference code lives in the [`ideogram4`](https://github.com/ideogram-oss/ideogram4) GitHub repo. Clone it, then from the repo root: ```bash pip install . ``` If you plan to modify the code, install in editable mode instead so changes under `src/ideogram4/` take effect without reinstalling: ```bash pip install -e . ``` ### CLI The plain `--prompt` is rewritten into the structured JSON caption the model expects by a "magic prompt" LLM. By default this uses Ideogram's hosted magic-prompt API, which is **free** and does the expansion server-side (no local model or system prompt needed). It reads `IDEOGRAM_API_KEY` — get a key at [developer.ideogram.ai](https://developer.ideogram.ai/): ```bash python run_inference.py \ --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \ --output out.png \ --quantization "nf4" \ --magic-prompt-key "$IDEOGRAM_API_KEY" ``` You can also run the expansion through your own LLM provider — one of our magic-prompt system prompt is **open source**. See the [Prompting Guide](https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md#magic-prompt) for details. For the highest-quality images, set `--height 2048 --width 2048` and `--sampler-preset V4_QUALITY_48`. #### Safety screening with Hive Prompt and output safety screening is performed via [Hive](https://thehive.ai/). Sign up and create a Text Moderation key and a Visual Content Moderation key, then export them as `HIVE_TEXT_MODERATION_KEY` and `HIVE_VISUAL_MODERATION_KEY` (or pass them via `--hive-text-key` / `--hive-visual-key`). ```bash python run_inference.py \ --prompt "an isometric illustration of a tiny city floating in the clouds" \ --output out.png \ --quantization "nf4" \ --magic-prompt-key "$MAGIC_PROMPT_API_KEY" \ --hive-text-key "$HIVE_TEXT_MODERATION_KEY" \ --hive-visual-key "$HIVE_VISUAL_MODERATION_KEY" ``` For sampler presets, parameter reference, and optimization tips, see [docs/inference.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/inference.md). ## Model Summary Ideogram 4 is a **foundation model trained entirely from scratch**, not a fine-tune or distillation of any existing checkpoint. It is a flow-matching text-to-image model built on a **fully single-stream** Diffusion Transformer (DiT) architecture. **Architecture:** - **Fully single-stream DiT.** Text and image tokens are concatenated into one unified sequence and processed through the same 34-layer transformer, with no separate text or image branches. This enables deep cross-modal interaction at every layer. - **Vision-language model as text encoder.** Instead of a text-only encoder like CLIP or T5, Ideogram 4 uses [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct), a full vision-language model that provides far richer understanding of visual concepts. Hidden states are extracted from **13 intermediate layers** and concatenated, giving the model multi-scale semantic features ranging from surface-level token information to deep compositional understanding. - **Dual-branch classifier-free guidance.** The conditional (positive) and unconditional (negative) branches can be independently refined, enabling separate control over prompt adherence and image quality. - **Flexible resolution.** Native support for any resolution from 256 to 2048 (multiples of 16), with aspect ratios up to 6:1. A single model handles everything from square thumbnails to ultrawide banners, with the noise schedule auto-adjusting per resolution. **Key Capabilities:** - **Extreme controllability.** Ideogram 4 is trained on structured JSON captions, giving users unprecedented control over composition, style, lighting, color palette, typography, and spatial layout, all from a single prompt. - **State-of-the-art text rendering.** Ideogram 4 delivers best-in-class in-image text generation (signage, logos, captions, watermarks, multi-line text) with high fidelity directly from the prompt. - **Spatial layout control.** Bounding-box coordinates in the prompt allow explicit placement of subjects, text elements, and background regions. - **Color palette conditioning.** Specify hex colors in the prompt to steer the image's dominant color scheme. For full architecture details, see [docs/model_architecture.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/model_architecture.md). For a walkthrough of how the pipeline components fit together, see [docs/pipeline.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/pipeline.md). ## Prompting Guide Ideogram 4 is trained exclusively on **structured JSON captions**. While plain-text prompts work, you will get the best results by providing a JSON object that follows our caption schema. Key points: - **Use JSON prompts** for maximum controllability — the model was trained on them and understands the structure natively. - **Color palette conditioning** — specify a `colour_palette` array of hex colors in the style description to steer the image's color scheme. - **Aspect ratio flexibility** — Ideogram 4 supports a wide range of aspect ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This is a key advantage for practical use: portraits, landscapes, banners, phone wallpapers, social media formats, etc. - **Bounding-box layout** — specify `bbox` coordinates in the prompt to explicitly place subjects, text elements, and background regions. - **Compositional control** — use `compositional_deconstruction` with bounding boxes and per-element descriptions for precise spatial layout. **Why JSON-only training?** We train exclusively on JSON so that training and inference share a single, common prompt format. The training captions themselves are deliberately **extremely descriptive**: each JSON exhaustively describes everything in the image to maximize training efficiency. The more text-to-image relationships each caption pins down, the more grounded supervision the model extracts from a single training pair, rather than having to infer those relationships across many sparsely-captioned samples. **Why JSON at inference time?** Because the model was trained on captions that name every object explicitly, the most reliable way to get every requested object rendered is to mirror that pattern. Plain-text prompts still work, but won't perform as well since the model was only trained on structured JSON captions. **Don't want to write JSON by hand?** That's what *magic prompt* is for: it uses an LLM to expand a plain-text prompt into a full structured caption before generation, so you get JSON-quality results from a casual prompt. It runs by default in `run_inference.py` (see the [CLI](#cli) section). See [docs/prompting.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md) for a full guide. ## Documentation | Document | Description | | :------- | :---------- | | [docs/prompting.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md) | How to write JSON prompts, color palette conditioning, aspect ratios | | [docs/inference.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/inference.md) | Sampler presets, parameter reference, resolutions, optimization tips | | [docs/model_architecture.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/model_architecture.md) | Architecture diagram, DiT spec, component details | | [docs/pipeline.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/pipeline.md) | Conceptual pipeline walkthrough — how all components fit together | | [docs/development.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/development.md) | Dev setup, pre-commit hooks, contributing | | [docs/safety.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/safety.md) | Pre-training, post-training, and inference-time safety mitigations; how to report violations | ## Citation If you find the provided code or models useful for your research, consider citing them as: ```bibtex @misc{ideogram-4-2026, author={Ideogram AI}, title={{Ideogram 4}}, year={2026}, howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}}, } ``` ## We're Hiring! We're looking for **Research Scientists** and **Research Engineers** to work on next-generation generative models and the products built on top of them. Interested candidates please apply https://jobs.ashbyhq.com/ideogram