ideogram-4-mlx / README.md
SceneWorks's picture
Upload README.md with huggingface_hub
4227924 verified
|
Raw
History Blame Contribute Delete
4.72 kB
---
license: other
license_name: ideogram-4-non-commercial
license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md
pipeline_tag: text-to-image
library_name: mlx
base_model: ideogram-ai/ideogram-4-fp8
tags:
- text-to-image
- image-generation
- diffusion
- flow-matching
- dit
- ideogram
- mlx
- apple-silicon
---
# Ideogram 4 — MLX (SceneWorks)
A native **Apple-silicon MLX** repackaging of [**Ideogram 4**](https://huggingface.co/ideogram-ai/ideogram-4-fp8) for **SceneWorks**. The weights are converted from Ideogram's official `fp8` reference release to MLX (bf16) and pre-quantized, so they load directly into SceneWorks' native Rust/MLX engine — **no PyTorch, no CUDA**.
This is a **weights-only** repackaging for inference on Apple silicon. The model, architecture, training, and capabilities are entirely Ideogram's; nothing about the model has been changed beyond the on-disk numeric format.
> ⚠️ **Non-commercial license.** These weights are governed by the [Ideogram Non-Commercial Model Agreement](LICENSE.md) — use is limited to non-commercial purposes. This is a private redistribution for use within SceneWorks. Review the license before any use.
## Versions
Two pre-quantized precisions ship as subfolders. They share the same architecture and produce the same images (within quantization tolerance); choose by your Mac's unified memory.
| Folder | Precision | On-disk | Peak @1024² ¹ | Suggested min RAM ² |
|--------|-----------|--------:|--------------:|--------------------:|
| `q4/` | **Q4 (packed) — recommended** | ~14 GB | ~28 GB | 48 GB |
| `q8/` | Q8 (packed) | ~27 GB | ~40 GB | 64 GB |
¹ Runtime peak (weights + activations) at 1024², measured on a 128 GB Mac via `mlx_rs::memory`. Activations grow with resolution²: Q4 peaks ~16 GB @256², ~28 GB @1024², ~64 GB @2048². The 2048²/6:1 ceiling needs ~96 GB even at Q4.
² Recommended minimum unified memory for the 1024² default bucket.
**Q4 is the recommended default** — it renders with no visible quality loss versus bf16, at roughly a third of the memory and a quarter of the download.
Both folders are pre-quantized (packed): the two DiTs and the text encoder are stored as group-wise affine quantized weights (group size 64), so they download smaller and load straight into quantized linears with no dense-memory transient. The VAE and tokenizer stay dense.
> The full-precision **bf16** snapshot (~50 GB) is the dense source these are derived from. It is not hosted here for size reasons; SceneWorks produces the packed versions from it offline (and can quantize it to Q4/Q8 at load time with no transient). Contact the SceneWorks team if you need it.
## Architecture
Ideogram 4 is a 9.3B-parameter single-stream flow-matching DiT (34 layers) with **asymmetric classifier-free guidance** (a separate unconditional transformer), a **Qwen3-VL-8B** text encoder (raw hidden states from 13 layers interleaved into 53,248 features), and the FLUX.2 VAE. Resolutions 256–2048, multiples of 16, aspect up to 6:1. See the [original model card](https://huggingface.co/ideogram-ai/ideogram-4-fp8) for details.
Each version folder contains the diffusers-style component tree: `transformer/`, `unconditional_transformer/`, `text_encoder/`, `vae/`, `tokenizer/`, `scheduler/`.
## Prompting — structured JSON captions
Ideogram 4 was trained on **structured JSON captions**, not free text. A plain-text prompt yields a coherent but prompt-agnostic image, while a JSON caption (a high-level description, a style block, and a compositional deconstruction with normalized bounding boxes and color palettes) gives accurate adherence. SceneWorks builds the JSON caption from its prompt UI (with a magic-prompt expander for plain text). See the [original card](https://huggingface.co/ideogram-ai/ideogram-4-fp8) for the schema.
## Usage
These weights are consumed by SceneWorks' native MLX engine (model id `ideogram_4`). They are **not** a diffusers / PyTorch snapshot and will not load with `diffusers` or `transformers`.
## Provenance & attribution
- **Model & weights:** © Ideogram, Inc. — [`ideogram-ai/ideogram-4-fp8`](https://huggingface.co/ideogram-ai/ideogram-4-fp8). Converted from the official fp8 reference to MLX bf16, then pre-quantized to packed Q4/Q8.
- **Conversion & quantization:** SceneWorks `mlx-gen-ideogram` (fp8→MLX converter + group-wise affine Q4/Q8 packer, byte-equivalent to load-time quantization).
- This is an **unofficial** community conversion for Apple-silicon inference, **not affiliated with or endorsed by Ideogram, Inc.**
All use of these weights is subject to the [Ideogram Non-Commercial Model Agreement](LICENSE.md).