Upload README.md with huggingface_hub

4227924 verified 14 days ago

4.72 kB

	---
	license: other
	license_name: ideogram-4-non-commercial
	license_link: https://huggingface.co/ideogram-ai/ideogram-4-fp8/blob/main/LICENSE.md
	pipeline_tag: text-to-image
	library_name: mlx
	base_model: ideogram-ai/ideogram-4-fp8
	tags:
	- text-to-image
	- image-generation
	- diffusion
	- flow-matching
	- dit
	- ideogram
	- mlx
	- apple-silicon
	---

	# Ideogram 4 — MLX (SceneWorks)

	A native Apple-silicon MLX repackaging of [Ideogram 4](https://huggingface.co/ideogram-ai/ideogram-4-fp8) for SceneWorks. The weights are converted from Ideogram's official `fp8` reference release to MLX (bf16) and pre-quantized, so they load directly into SceneWorks' native Rust/MLX engine — no PyTorch, no CUDA.

	This is a weights-only repackaging for inference on Apple silicon. The model, architecture, training, and capabilities are entirely Ideogram's; nothing about the model has been changed beyond the on-disk numeric format.

	> ⚠️ Non-commercial license. These weights are governed by the [Ideogram Non-Commercial Model Agreement](LICENSE.md) — use is limited to non-commercial purposes. This is a private redistribution for use within SceneWorks. Review the license before any use.

	## Versions

	Two pre-quantized precisions ship as subfolders. They share the same architecture and produce the same images (within quantization tolerance); choose by your Mac's unified memory.

	\| Folder \| Precision \| On-disk \| Peak @1024² ¹ \| Suggested min RAM ² \|
	\|--------\|-----------\|--------:\|--------------:\|--------------------:\|
	\| `q4/` \| Q4 (packed) — recommended \| ~14 GB \| ~28 GB \| 48 GB \|
	\| `q8/` \| Q8 (packed) \| ~27 GB \| ~40 GB \| 64 GB \|

	¹ Runtime peak (weights + activations) at 1024², measured on a 128 GB Mac via `mlx_rs::memory`. Activations grow with resolution²: Q4 peaks ~16 GB @256², ~28 GB @1024², ~64 GB @2048². The 2048²/6:1 ceiling needs ~96 GB even at Q4.
	² Recommended minimum unified memory for the 1024² default bucket.

	Q4 is the recommended default — it renders with no visible quality loss versus bf16, at roughly a third of the memory and a quarter of the download.

	Both folders are pre-quantized (packed): the two DiTs and the text encoder are stored as group-wise affine quantized weights (group size 64), so they download smaller and load straight into quantized linears with no dense-memory transient. The VAE and tokenizer stay dense.

	> The full-precision bf16 snapshot (~50 GB) is the dense source these are derived from. It is not hosted here for size reasons; SceneWorks produces the packed versions from it offline (and can quantize it to Q4/Q8 at load time with no transient). Contact the SceneWorks team if you need it.

	## Architecture

	Ideogram 4 is a 9.3B-parameter single-stream flow-matching DiT (34 layers) with asymmetric classifier-free guidance (a separate unconditional transformer), a Qwen3-VL-8B text encoder (raw hidden states from 13 layers interleaved into 53,248 features), and the FLUX.2 VAE. Resolutions 256–2048, multiples of 16, aspect up to 6:1. See the [original model card](https://huggingface.co/ideogram-ai/ideogram-4-fp8) for details.

	Each version folder contains the diffusers-style component tree: `transformer/`, `unconditional_transformer/`, `text_encoder/`, `vae/`, `tokenizer/`, `scheduler/`.

	## Prompting — structured JSON captions

	Ideogram 4 was trained on structured JSON captions, not free text. A plain-text prompt yields a coherent but prompt-agnostic image, while a JSON caption (a high-level description, a style block, and a compositional deconstruction with normalized bounding boxes and color palettes) gives accurate adherence. SceneWorks builds the JSON caption from its prompt UI (with a magic-prompt expander for plain text). See the [original card](https://huggingface.co/ideogram-ai/ideogram-4-fp8) for the schema.

	## Usage

	These weights are consumed by SceneWorks' native MLX engine (model id `ideogram_4`). They are not a diffusers / PyTorch snapshot and will not load with `diffusers` or `transformers`.

	## Provenance & attribution

	- Model & weights: © Ideogram, Inc. — [`ideogram-ai/ideogram-4-fp8`](https://huggingface.co/ideogram-ai/ideogram-4-fp8). Converted from the official fp8 reference to MLX bf16, then pre-quantized to packed Q4/Q8.
	- Conversion & quantization: SceneWorks `mlx-gen-ideogram` (fp8→MLX converter + group-wise affine Q4/Q8 packer, byte-equivalent to load-time quantization).
	- This is an unofficial community conversion for Apple-silicon inference, not affiliated with or endorsed by Ideogram, Inc.

	All use of these weights is subject to the [Ideogram Non-Commercial Model Agreement](LICENSE.md).