Text-to-Image
Diffusers
Safetensors
Ideogram4Pipeline
image-generation
diffusion
flow-matching
dit
ideogram
Instructions to use cocktailpeanut/ideogram-4-nf4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use cocktailpeanut/ideogram-4-nf4 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("cocktailpeanut/ideogram-4-nf4", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| license: other | |
| license_name: ideogram-4-non-commercial | |
| license_link: https://huggingface.co/ideogram-ai/ideogram-4-nf4/blob/main/LICENSE.md | |
| pipeline_tag: text-to-image | |
| tags: | |
| - text-to-image | |
| - image-generation | |
| - diffusion | |
| - flow-matching | |
| - dit | |
| - ideogram | |
| <p align="center"><a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="assets/ideogram_logo.svg" alt="Ideogram" width="500"></a></p> | |
| <p align="center"><em>Ideogram 4: Open image model at the forefront of design</em></p> | |
| <p align="center"> | |
| <a href="https://ideogram.ai/blog/ideogram-4.0/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Blog-Post-orange" alt="Blog Post"></a> | |
| <a href="https://github.com/ideogram-oss/ideogram4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Code-GitHub-181717?logo=github" alt="Code"></a> | |
| <a href="https://huggingface.co/collections/ideogram-ai/ideogram-4" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Model-HuggingFace-blue?logo=huggingface" alt="Model"></a> | |
| <a href="https://developer.ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/API-developer.ideogram.ai-purple" alt="API"></a> | |
| <a href="https://ideogram.ai/" target="_blank" rel="noopener noreferrer"><img src="https://img.shields.io/badge/Official%20Site-ideogram.ai-ff69b4" alt="Official Site"></a> | |
| </p> | |
| <p align="center"> | |
| <img src="assets/samples/collage_landscape.jpg" alt="A collage of Ideogram 4 samples spanning photorealism, illustration, typography, and poster design"> | |
| </p> | |
| Ideogram 4 is **[Ideogram](https://ideogram.ai)'s first open-source text-to-image model**. It is a **state-of-the-art foundation model trained from scratch** — not a fine-tune of any existing model. It introduces a new structured JSON prompting interface, with best-in-class multilingual text rendering, deep language understanding, explicit bounding-box layout and color-palette controls, and native 2k resolution images. The easiest way to try the model is online at **[ideogram.ai](https://ideogram.ai/)**. | |
| We believe openness drives innovation, and we invite the research community to innovate with us on the forefront of visual intelligence. | |
| ## Table of Contents | |
| 1. [News](#news) | |
| 2. [Model Zoo](#model-zoo) | |
| 3. [Performance](#performance) | |
| 4. [Quick Start](#quick-start) | |
| 5. [Model Summary](#model-summary) | |
| 6. [Prompting Guide](#prompting-guide) | |
| 7. [Documentation](#documentation) | |
| 8. [Citation](#citation) | |
| ## News | |
| * **[2026-06-03]** **Ideogram 4 released!** Inference code and weights | |
| are now public, and our [technical blog post](https://ideogram.ai/blog/ideogram-4.0/) is live. See the | |
| [Quick Start](#quick-start) section to generate your first image, or try the | |
| model online at [ideogram.ai](https://ideogram.ai/). | |
| ## Model Zoo | |
| | Model | Params | Weight Quantization | Supported Hardware | Diffusers Support | License | | |
| | :--- | :---: | :---: | :---: | :---: | :---: | | |
| | **[Ideogram 4 (nf4)](https://huggingface.co/ideogram-ai/ideogram-4-nf4)** | 9.3B | nf4 | CUDA | Yes | [Ideogram 4 Non-Commercial](https://huggingface.co/ideogram-ai/ideogram-4-nf4/blob/main/LICENSE.md) | | |
| | **[Ideogram 4 (fp8)](https://huggingface.co/ideogram-ai/ideogram-4-fp8)** | 9.3B | fp8 | All | No | [Ideogram 4 Non-Commercial](https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md) | | |
| We plan to support more quantizations in the future. | |
| ## Performance | |
| We evaluate Ideogram 4 across third-party arenas and benchmarks, standard | |
| open-source benchmarks, and our own internal human-preference benchmark. Across | |
| all of them, **Ideogram 4 is the best open-weight image model by far, and sits | |
| at the frontier of design.** | |
| ### Design Arena | |
| [Design Arena](https://www.designarena.ai/) is a third-party image Elo | |
| leaderboard focused specifically on design-oriented generation. On the overall | |
| board, Ideogram 4 is the top-ranked open-weight model, trailing only proprietary | |
| GPT and Gemini models: | |
| <p align="center"> | |
| <img src="assets/benchmarks/design_arena.png" alt="Design Arena overall image Elo leaderboard with Ideogram 4.0 as the top open-weight model"> | |
| </p> | |
| Filtered to open-weight models only, Ideogram 4 leads by a commanding margin, | |
| well ahead of the next-best open model: | |
| <p align="center"> | |
| <img src="assets/benchmarks/design_arena2.png" alt="Design Arena open-weight image Elo leaderboard, with Ideogram 4.0 well ahead of all other open models"> | |
| </p> | |
| ### ContraLabs | |
| [ContraLabs](https://contralabs.com/research) ran a blind typography evaluation judged by | |
| ten professional designers from Contra's top-earning talent. Ideogram 4 leads on | |
| first-place win rate, picked as the best of four models 47.9% of the time | |
| overall — well ahead of Gemini 3.1 Flash Image Preview (Nano Banana 2) at 30.0%, | |
| FLUX.2 [max] (15.5%), and Grok Imagine 1.0 (15.0%): | |
| <p align="center"> | |
| <img src="assets/benchmarks/contralabs_typography.png" alt="ContraLabs typography first-place win rate, with Ideogram v4 leading"> | |
| </p> | |
| It also wins on practical usability: asked "Would you use this in real client | |
| work?", the same designers rated Ideogram 4 highest at 3.55 / 5 — significantly | |
| above Nano Banana 2 (2.84), Grok Imagine 1.0 (2.61), and FLUX.2 [max] (2.49): | |
| <p align="center"> | |
| <img src="assets/benchmarks/contralabs_typography2.png" alt="ContraLabs 'would you use this in real client work?' rating, with Ideogram v4 leading"> | |
| </p> | |
| ### LMArena | |
| On [LMArena](https://lmarena.ai/), a third-party text-to-image leaderboard that | |
| measures general-purpose text-to-image use cases, Ideogram is the top-ranked | |
| open-weight lab and a top-5 image generation lab overall — beaten only by giant | |
| companies with vastly larger budgets and resources: | |
| <p align="center"> | |
| <img src="assets/benchmarks/lmarena_benchmark.png" alt="LMArena text-to-image lab leaderboard with Ideogram"> | |
| </p> | |
| ### Ideogram internal eval | |
| For our internal human-preference benchmark, focused on graphic design and | |
| photography, we had graphic designers deeply familiar with professional design | |
| work do the rating blind. Bradley-Terry scores rank Ideogram 4 #2 overall — | |
| behind only GPT Image 2 medium — and the top open-weight model: | |
| <p align="center"> | |
| <img src="assets/benchmarks/ideogram_benchmark.png" alt="Ideogram internal design leaderboard with Ideogram 4.0"> | |
| </p> | |
| ### Open-source benchmarks | |
| On standard open-source benchmarks measuring core capabilities — layout control | |
| (7Bench), spatial reasoning and object fidelity (SpatialGenEval), text rendering | |
| (X-Omni OCR), and prompt alignment (Prism) — Ideogram 4 closes the gap to the | |
| leading closed-source models across every axis. On layout control (7Bench), it | |
| is significantly better than all closed-source models: | |
| <p align="center"> | |
| <img src="assets/benchmarks/opensource.png" alt="Five-axis capability radar comparing Ideogram 4.0 to leading closed-source models on layout control, spatial reasoning, object fidelity, prompt alignment, and text rendering"> | |
| </p> | |
| At 9.3B parameters, Ideogram 4 delivers the best text rendering of any open-weight | |
| release we benchmarked — ahead of much larger models like Qwen-Image (20B), | |
| FLUX.2 [dev] (32B), and HunyuanImage 3.0 (80B MoE): | |
| <p align="center"> | |
| <img src="assets/benchmarks/opensource2.png" alt="Parameter-efficiency scatter plot showing Ideogram 4.0 at 9.3B parameters leading all other open-weight models on text rendering"> | |
| </p> | |
| ## Quick Start | |
| ### Install | |
| The inference code lives in the [`ideogram4`](https://github.com/ideogram-oss/ideogram4) GitHub repo. Clone it, then from the repo root: | |
| ```bash | |
| pip install . | |
| ``` | |
| If you plan to modify the code, install in editable mode instead so changes | |
| under `src/ideogram4/` take effect without reinstalling: | |
| ```bash | |
| pip install -e . | |
| ``` | |
| ### CLI | |
| The plain `--prompt` is rewritten into the structured JSON caption the model | |
| expects by a "magic prompt" LLM. By default this uses Ideogram's hosted | |
| magic-prompt API, which is **free** and does the expansion server-side (no local | |
| model or system prompt needed). It reads `IDEOGRAM_API_KEY` — get a key at | |
| [developer.ideogram.ai](https://developer.ideogram.ai/): | |
| ```bash | |
| python run_inference.py \ | |
| --prompt "a ginger cat wearing a tiny wizard hat reading a spellbook" \ | |
| --output out.png \ | |
| --quantization "nf4" \ | |
| --magic-prompt-key "$IDEOGRAM_API_KEY" | |
| ``` | |
| You can also run the expansion through your own LLM provider — one of our magic-prompt | |
| system prompt is **open source**. See the | |
| [Prompting Guide](https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md#magic-prompt) for details. | |
| For the highest-quality images, set `--height 2048 --width 2048` and | |
| `--sampler-preset V4_QUALITY_48`. | |
| #### Safety screening with Hive | |
| Prompt and output safety screening is performed via [Hive](https://thehive.ai/). | |
| Sign up and create a Text Moderation key and a Visual Content Moderation key, | |
| then export them as `HIVE_TEXT_MODERATION_KEY` and `HIVE_VISUAL_MODERATION_KEY` | |
| (or pass them via `--hive-text-key` / `--hive-visual-key`). | |
| ```bash | |
| python run_inference.py \ | |
| --prompt "an isometric illustration of a tiny city floating in the clouds" \ | |
| --output out.png \ | |
| --quantization "nf4" \ | |
| --magic-prompt-key "$MAGIC_PROMPT_API_KEY" \ | |
| --hive-text-key "$HIVE_TEXT_MODERATION_KEY" \ | |
| --hive-visual-key "$HIVE_VISUAL_MODERATION_KEY" | |
| ``` | |
| For sampler presets, parameter reference, and optimization tips, see | |
| [docs/inference.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/inference.md). | |
| ## Model Summary | |
| Ideogram 4 is a **foundation model trained entirely from scratch**, not a | |
| fine-tune or distillation of any existing checkpoint. It is a flow-matching | |
| text-to-image model built on a **fully single-stream** Diffusion Transformer | |
| (DiT) architecture. | |
| **Architecture:** | |
| - **Fully single-stream DiT.** Text and image tokens are concatenated into one | |
| unified sequence and processed through the same 34-layer transformer, with no | |
| separate text or image branches. This enables deep cross-modal interaction at | |
| every layer. | |
| - **Vision-language model as text encoder.** Instead of a text-only encoder | |
| like CLIP or T5, Ideogram 4 uses | |
| [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct), | |
| a full vision-language model that provides far richer understanding of visual | |
| concepts. Hidden states are extracted from **13 intermediate layers** and | |
| concatenated, giving the model multi-scale semantic features ranging from | |
| surface-level token information to deep compositional understanding. | |
| - **Dual-branch classifier-free guidance.** The conditional (positive) and | |
| unconditional (negative) branches can be independently refined, enabling | |
| separate control over prompt adherence and image quality. | |
| - **Flexible resolution.** Native support for any resolution from 256 to 2048 | |
| (multiples of 16), with aspect ratios up to 6:1. A single model handles | |
| everything from square thumbnails to ultrawide banners, with the noise | |
| schedule auto-adjusting per resolution. | |
| **Key Capabilities:** | |
| - **Extreme controllability.** Ideogram 4 is trained on structured JSON | |
| captions, giving users unprecedented control over composition, style, | |
| lighting, color palette, typography, and spatial layout, all from a single | |
| prompt. | |
| - **State-of-the-art text rendering.** Ideogram 4 delivers best-in-class | |
| in-image text generation (signage, logos, captions, watermarks, multi-line | |
| text) with high fidelity directly from the prompt. | |
| - **Spatial layout control.** Bounding-box coordinates in the prompt allow | |
| explicit placement of subjects, text elements, and background regions. | |
| - **Color palette conditioning.** Specify hex colors in the prompt to steer the | |
| image's dominant color scheme. | |
| For full architecture details, see | |
| [docs/model_architecture.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/model_architecture.md). For a walkthrough of | |
| how the pipeline components fit together, see | |
| [docs/pipeline.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/pipeline.md). | |
| ## Prompting Guide | |
| Ideogram 4 is trained exclusively on **structured JSON captions**. While | |
| plain-text prompts work, you will get the best results by providing a JSON | |
| object that follows our caption schema. | |
| Key points: | |
| - **Use JSON prompts** for maximum controllability — the model was trained on | |
| them and understands the structure natively. | |
| - **Color palette conditioning** — specify a `colour_palette` array of hex | |
| colors in the style description to steer the image's color scheme. | |
| - **Aspect ratio flexibility** — Ideogram 4 supports a wide range of aspect | |
| ratios (any multiple-of-16 resolution from 256 to 2048 on each side). This | |
| is a key advantage for practical use: portraits, landscapes, banners, | |
| phone wallpapers, social media formats, etc. | |
| - **Bounding-box layout** — specify `bbox` coordinates in the prompt to | |
| explicitly place subjects, text elements, and background regions. | |
| - **Compositional control** — use `compositional_deconstruction` with bounding | |
| boxes and per-element descriptions for precise spatial layout. | |
| **Why JSON-only training?** We train exclusively on JSON so that training | |
| and inference share a single, common prompt format. The training captions themselves are deliberately | |
| **extremely descriptive**: each JSON exhaustively describes everything in | |
| the image to maximize training efficiency. The more | |
| text-to-image relationships each caption pins down, the more grounded | |
| supervision the model extracts from a single training pair, rather than | |
| having to infer those relationships across many sparsely-captioned samples. | |
| **Why JSON at inference time?** Because the model was trained on captions | |
| that name every object explicitly, the most reliable way to get every | |
| requested object rendered is to mirror that pattern. Plain-text prompts still work, but | |
| won't perform as well since the model was only trained on structured JSON captions. | |
| **Don't want to write JSON by hand?** That's what *magic prompt* is for: it uses | |
| an LLM to expand a plain-text prompt into a full structured caption before | |
| generation, so you get JSON-quality results from a casual prompt. It runs by | |
| default in `run_inference.py` (see the [CLI](#cli) section). | |
| See [docs/prompting.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md) for a full guide. | |
| ## Documentation | |
| | Document | Description | | |
| | :------- | :---------- | | |
| | [docs/prompting.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/prompting.md) | How to write JSON prompts, color palette conditioning, aspect ratios | | |
| | [docs/inference.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/inference.md) | Sampler presets, parameter reference, resolutions, optimization tips | | |
| | [docs/model_architecture.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/model_architecture.md) | Architecture diagram, DiT spec, component details | | |
| | [docs/pipeline.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/pipeline.md) | Conceptual pipeline walkthrough — how all components fit together | | |
| | [docs/development.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/development.md) | Dev setup, pre-commit hooks, contributing | | |
| | [docs/safety.md](https://github.com/ideogram-oss/ideogram4/blob/main/docs/safety.md) | Pre-training, post-training, and inference-time safety mitigations; how to report violations | | |
| ## Citation | |
| If you find the provided code or models useful for your research, consider citing them as: | |
| ```bibtex | |
| @misc{ideogram-4-2026, | |
| author={Ideogram AI}, | |
| title={{Ideogram 4}}, | |
| year={2026}, | |
| howpublished={\url{https://ideogram.ai/blog/ideogram-4.0/}}, | |
| } | |
| ``` | |
| ## We're Hiring! | |
| We're looking for **Research Scientists** and **Research Engineers** to | |
| work on next-generation generative models and the products built on top of | |
| them. Interested candidates please apply https://jobs.ashbyhq.com/ideogram | |