HiDream-O1-Image — FP8 Mixed (ComfyUI)

This is the FP8 mixed-precision quantization of HiDream-O1-Image for use with ComfyUI. By quantizing to 8-bit floats, the model fits comfortably within ~10 GB of VRAM — making it accessible on 12 GB GPUs (RTX 3080/4070/4080, etc.) with minimal quality trade-off.

Custom ComfyUI Node: Saganaki22/HiDream_O1-ComfyUI

VRAM Requirements

Precision	Approximate VRAM
BF16	17 – 20 GB
FP16	17 – 20 GB
FP8 Mixed (this repo)	~10 GB

This is the recommended variant for GPUs with less than 16 GB VRAM. Tested on 12 GB cards at 2048 × 2048 resolution.

What is FP8 Mixed? Weights are stored in float8_e4m3fn format. Sensitive layers (norms, embeddings, output heads) retain higher precision to preserve stability, hence "mixed." On CUDA-capable GPUs with Hopper or Ada Lovelace architecture (RTX 40xx, H100), FP8 compute is hardware-accelerated. On older GPUs, weights are dequantized on-the-fly — still saving VRAM, with a small speed penalty.

Quick Start — ComfyUI

1. Install the Custom Node

cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/HiDream_O1-ComfyUI
pip install -r HiDream_O1-ComfyUI/requirements.txt

Or install via ComfyUI Manager by searching for HiDream O1.

2. Download the Weights

huggingface-cli download drbaph/HiDream-O1-Image-FP8 \
    --local-dir ComfyUI/models/diffusion_models/HiDream-O1-Image-fp8

3. Load in ComfyUI

Open ComfyUI and use the workflow provided in the custom node repository. Point the model loader to HiDream-O1-Image-fp8.

About HiDream-O1-Image

HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) — no external VAEs, no disjoint text encoders. It encodes raw pixels, text, and task-specific conditions in a single shared token space, supporting:

Text-to-image generation up to 2,048 × 2,048
Instruction-based image editing
Subject-driven personalization (multi-reference IP)
Long-text and multilingual text rendering

At only 9B parameters it matches or exceeds much larger open-source DiTs and leading closed-source models. It debuted at #8 in the Artificial Analysis Text to Image Arena (2026-05-05).

Key Features

🧬 Pixel-Level Unified Transformer — end-to-end on raw pixels, no VAE, no disjoint text encoder
🎨 One Model, Many Tasks — T2I, editing, personalization, storyboard generation
🧠 Reasoning-Driven Prompt Agent — built-in "thinking" agent that resolves layout and rendering before generation
🖼️ Native High Resolution — direct synthesis up to 2,048 × 2,048
⚡ 9B Parameters — performance parity with models many times larger
💾 FP8 Quantized — ~half the VRAM of full-precision variants, minimal quality loss

Model Variants

Repo	Precision	VRAM	Inference Steps
drbaph/HiDream-O1-Image-BF16	BF16	17–20 GB	50
drbaph/HiDream-O1-Image-FP16	FP16	17–20 GB	50
drbaph/HiDream-O1-Image-FP8 (this repo)	FP8 Mixed	~10 GB	50
HiDream-ai/HiDream-O1-Image	Original	—	50
HiDream-ai/HiDream-O1-Image-Dev	Original Dev	—	28

Benchmark Results (from original model)

GenEval (compositional generation) — HiDream-O1-Image scores 0.90 overall at 9B params, second only to the 200B+ Pro variant and ahead of GPT Image 2 (0.89).

DPG-Bench (dense prompt alignment) — Overall score 89.83, ranking second behind the Pro variant.

HPSv3 (human preference) — Overall score 10.37, outperforming GPT Image 2 (10.21) and Nano Banana 2.0 (10.01).

License

The original HiDream-O1-Image model and code are released under the MIT License. This FP8 quantization inherits the same license.

drbaph
/

HiDream-O1-Image-FP8