ideogram-4-mlx / README.md
SceneWorks's picture
Upload README.md with huggingface_hub
4227924 verified
|
Raw
History Blame Contribute Delete
4.72 kB
metadata
license: other
license_name: ideogram-4-non-commercial
license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md
pipeline_tag: text-to-image
library_name: mlx
base_model: ideogram-ai/ideogram-4-fp8
tags:
  - text-to-image
  - image-generation
  - diffusion
  - flow-matching
  - dit
  - ideogram
  - mlx
  - apple-silicon

Ideogram 4 — MLX (SceneWorks)

A native Apple-silicon MLX repackaging of Ideogram 4 for SceneWorks. The weights are converted from Ideogram's official fp8 reference release to MLX (bf16) and pre-quantized, so they load directly into SceneWorks' native Rust/MLX engine — no PyTorch, no CUDA.

This is a weights-only repackaging for inference on Apple silicon. The model, architecture, training, and capabilities are entirely Ideogram's; nothing about the model has been changed beyond the on-disk numeric format.

⚠️ Non-commercial license. These weights are governed by the Ideogram Non-Commercial Model Agreement — use is limited to non-commercial purposes. This is a private redistribution for use within SceneWorks. Review the license before any use.

Versions

Two pre-quantized precisions ship as subfolders. They share the same architecture and produce the same images (within quantization tolerance); choose by your Mac's unified memory.

Folder Precision On-disk Peak @1024² ¹ Suggested min RAM ²
q4/ Q4 (packed) — recommended ~14 GB ~28 GB 48 GB
q8/ Q8 (packed) ~27 GB ~40 GB 64 GB

¹ Runtime peak (weights + activations) at 1024², measured on a 128 GB Mac via mlx_rs::memory. Activations grow with resolution²: Q4 peaks ~16 GB @256², ~28 GB @1024², ~64 GB @2048². The 2048²/6:1 ceiling needs ~96 GB even at Q4. ² Recommended minimum unified memory for the 1024² default bucket.

Q4 is the recommended default — it renders with no visible quality loss versus bf16, at roughly a third of the memory and a quarter of the download.

Both folders are pre-quantized (packed): the two DiTs and the text encoder are stored as group-wise affine quantized weights (group size 64), so they download smaller and load straight into quantized linears with no dense-memory transient. The VAE and tokenizer stay dense.

The full-precision bf16 snapshot (~50 GB) is the dense source these are derived from. It is not hosted here for size reasons; SceneWorks produces the packed versions from it offline (and can quantize it to Q4/Q8 at load time with no transient). Contact the SceneWorks team if you need it.

Architecture

Ideogram 4 is a 9.3B-parameter single-stream flow-matching DiT (34 layers) with asymmetric classifier-free guidance (a separate unconditional transformer), a Qwen3-VL-8B text encoder (raw hidden states from 13 layers interleaved into 53,248 features), and the FLUX.2 VAE. Resolutions 256–2048, multiples of 16, aspect up to 6:1. See the original model card for details.

Each version folder contains the diffusers-style component tree: transformer/, unconditional_transformer/, text_encoder/, vae/, tokenizer/, scheduler/.

Prompting — structured JSON captions

Ideogram 4 was trained on structured JSON captions, not free text. A plain-text prompt yields a coherent but prompt-agnostic image, while a JSON caption (a high-level description, a style block, and a compositional deconstruction with normalized bounding boxes and color palettes) gives accurate adherence. SceneWorks builds the JSON caption from its prompt UI (with a magic-prompt expander for plain text). See the original card for the schema.

Usage

These weights are consumed by SceneWorks' native MLX engine (model id ideogram_4). They are not a diffusers / PyTorch snapshot and will not load with diffusers or transformers.

Provenance & attribution

  • Model & weights: © Ideogram, Inc. — ideogram-ai/ideogram-4-fp8. Converted from the official fp8 reference to MLX bf16, then pre-quantized to packed Q4/Q8.
  • Conversion & quantization: SceneWorks mlx-gen-ideogram (fp8→MLX converter + group-wise affine Q4/Q8 packer, byte-equivalent to load-time quantization).
  • This is an unofficial community conversion for Apple-silicon inference, not affiliated with or endorsed by Ideogram, Inc.

All use of these weights is subject to the Ideogram Non-Commercial Model Agreement.