HiDream-O1-Image β FP8 Mixed (ComfyUI)
This is the FP8 mixed-precision quantization of HiDream-O1-Image for use with ComfyUI. By quantizing to 8-bit floats, the model fits comfortably within ~10 GB of VRAM β making it accessible on 12 GB GPUs (RTX 3080/4070/4080, etc.) with minimal quality trade-off.
Custom ComfyUI Node: Saganaki22/HiDream_O1-ComfyUI
VRAM Requirements
| Precision | Approximate VRAM |
|---|---|
| BF16 | 17 β 20 GB |
| FP16 | 17 β 20 GB |
| FP8 Mixed (this repo) | ~10 GB |
This is the recommended variant for GPUs with less than 16 GB VRAM. Tested on 12 GB cards at 2048 Γ 2048 resolution.
What is FP8 Mixed? Weights are stored in
float8_e4m3fnformat. Sensitive layers (norms, embeddings, output heads) retain higher precision to preserve stability, hence "mixed." On CUDA-capable GPUs with Hopper or Ada Lovelace architecture (RTX 40xx, H100), FP8 compute is hardware-accelerated. On older GPUs, weights are dequantized on-the-fly β still saving VRAM, with a small speed penalty.
Quick Start β ComfyUI
1. Install the Custom Node
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/HiDream_O1-ComfyUI
pip install -r HiDream_O1-ComfyUI/requirements.txt
Or install via ComfyUI Manager by searching for HiDream O1.
2. Download the Weights
huggingface-cli download drbaph/HiDream-O1-Image-FP8 \
--local-dir ComfyUI/models/diffusion_models/HiDream-O1-Image-fp8
3. Load in ComfyUI
Open ComfyUI and use the workflow provided in the custom node repository. Point the model loader to HiDream-O1-Image-fp8.
About HiDream-O1-Image
HiDream-O1-Image is a natively unified image generative foundation model built on a Pixel-level Unified Transformer (UiT) β no external VAEs, no disjoint text encoders. It encodes raw pixels, text, and task-specific conditions in a single shared token space, supporting:
- Text-to-image generation up to 2,048 Γ 2,048
- Instruction-based image editing
- Subject-driven personalization (multi-reference IP)
- Long-text and multilingual text rendering
At only 9B parameters it matches or exceeds much larger open-source DiTs and leading closed-source models. It debuted at #8 in the Artificial Analysis Text to Image Arena (2026-05-05).
Key Features
- 𧬠Pixel-Level Unified Transformer β end-to-end on raw pixels, no VAE, no disjoint text encoder
- π¨ One Model, Many Tasks β T2I, editing, personalization, storyboard generation
- π§ Reasoning-Driven Prompt Agent β built-in "thinking" agent that resolves layout and rendering before generation
- πΌοΈ Native High Resolution β direct synthesis up to 2,048 Γ 2,048
- β‘ 9B Parameters β performance parity with models many times larger
- πΎ FP8 Quantized β ~half the VRAM of full-precision variants, minimal quality loss
Model Variants
| Repo | Precision | VRAM | Inference Steps |
|---|---|---|---|
| drbaph/HiDream-O1-Image-BF16 | BF16 | 17β20 GB | 50 |
| drbaph/HiDream-O1-Image-FP16 | FP16 | 17β20 GB | 50 |
| drbaph/HiDream-O1-Image-FP8 (this repo) | FP8 Mixed | ~10 GB | 50 |
| HiDream-ai/HiDream-O1-Image | Original | β | 50 |
| HiDream-ai/HiDream-O1-Image-Dev | Original Dev | β | 28 |
Benchmark Results (from original model)
GenEval (compositional generation) β HiDream-O1-Image scores 0.90 overall at 9B params, second only to the 200B+ Pro variant and ahead of GPT Image 2 (0.89).
DPG-Bench (dense prompt alignment) β Overall score 89.83, ranking second behind the Pro variant.
HPSv3 (human preference) β Overall score 10.37, outperforming GPT Image 2 (10.21) and Nano Banana 2.0 (10.01).
License
The original HiDream-O1-Image model and code are released under the MIT License. This FP8 quantization inherits the same license.
Links
- π Original model: HiDream-ai/HiDream-O1-Image
- π§ ComfyUI node: Saganaki22/HiDream_O1-ComfyUI
- π Technical report: HiDream-O1-Image.pdf
- π€ Online demo: HiDream-O1-Image Space
- Downloads last month
- 2,171

