HiDream-O1-Image — BF16 (ComfyUI)

This is the BF16 conversion of HiDream-O1-Image for use with ComfyUI. Weights have been cast to bfloat16 for a balance of precision and memory efficiency.

Custom ComfyUI Node: Saganaki22/HiDream_O1-ComfyUI

VRAM Requirements

Precision	Approximate VRAM
BF16 (this repo)	17 – 20 GB
FP16	17 – 20 GB
FP8 Mixed	~10 GB

A GPU with at least 20 GB VRAM is recommended for comfortable use at full 2048 × 2048 resolution. 24 GB cards (RTX 3090/4090, A5000, etc.) will have no issues.

Quick Start — ComfyUI

1. Install the Custom Node

cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/HiDream_O1-ComfyUI
pip install -r HiDream_O1-ComfyUI/requirements.txt

Or install via ComfyUI Manager by searching for HiDream O1.

2. Download the Weights

huggingface-cli download drbaph/HiDream-O1-Image-BF16 \
    --local-dir ComfyUI/models/diffusion_models/HiDream-O1-Image-bf16

3. Load in ComfyUI

Open ComfyUI and use the workflow provided in the custom node repository. Point the model loader to HiDream-O1-Image-bf16.

About HiDream-O1-Image

HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) — no external VAEs, no disjoint text encoders. It encodes raw pixels, text, and task-specific conditions in a single shared token space, supporting:

Text-to-image generation up to 2,048 × 2,048
Instruction-based image editing
Subject-driven personalization (multi-reference IP)
Long-text and multilingual text rendering

At only 9B parameters it matches or exceeds much larger open-source DiTs and leading closed-source models. It debuted at #8 in the Artificial Analysis Text to Image Arena (2026-05-05).

Key Features

🧬 Pixel-Level Unified Transformer — end-to-end on raw pixels, no VAE, no disjoint text encoder
🎨 One Model, Many Tasks — T2I, editing, personalization, storyboard generation
🧠 Reasoning-Driven Prompt Agent — built-in "thinking" agent that resolves layout and rendering before generation
🖼️ Native High Resolution — direct synthesis up to 2,048 × 2,048
⚡ 9B Parameters — performance parity with models many times larger

Model Variants

Repo	Precision	VRAM	Inference Steps
drbaph/HiDream-O1-Image-BF16 (this repo)	BF16	17–20 GB	50
drbaph/HiDream-O1-Image-FP16	FP16	17–20 GB	50
drbaph/HiDream-O1-Image-FP8	FP8 Mixed	~10 GB	50
HiDream-ai/HiDream-O1-Image	Original	—	50
HiDream-ai/HiDream-O1-Image-Dev	Original Dev	—	28

Benchmark Results (from original model)

GenEval (compositional generation) — HiDream-O1-Image scores 0.90 overall at 9B params, second only to the 200B+ Pro variant and ahead of GPT Image 2 (0.89).

DPG-Bench (dense prompt alignment) — Overall score 89.83, ranking second behind the Pro variant.

HPSv3 (human preference) — Overall score 10.37, outperforming GPT Image 2 (10.21) and Nano Banana 2.0 (10.01).

License

The original HiDream-O1-Image model and code are released under the MIT License. This BF16 conversion inherits the same license.

drbaph
/

HiDream-O1-Image-BF16