Photoroom
/

FLUX.2-klein-4b-fp8-diffusers

+---
+license: other
+license_name: flux-2-klein-4b-agreement
+license_link: https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/blob/main/LICENSE
+library_name: diffusers
+pipeline_tag: image-to-image
+tags:
+  - flux2
+  - fp8
+  - torchao
+  - diffusers
+  - transformer
+base_model: black-forest-labs/FLUX.2-klein-4B
+---
+# FLUX.2-klein-4b FP8 — Diffusers Transformer
+Diffusers-compatible **transformer-only** weights for
+[FLUX.2-klein-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B),
+converted from Black Forest Labs'
+[FP8 checkpoint](https://huggingface.co/black-forest-labs/FLUX.2-klein-4b-fp8)
+(ComfyUI format).
+> **This repo does not contain the full pipeline.**
+> Text encoders, VAE, and scheduler are loaded from
+> [black-forest-labs/FLUX.2-klein-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B).
+## Available variants
+| Subfolder | Precision | Format | Size | Use case |
+|---|---|---|---|---|
+| `transformer_bf16/` | bfloat16 | safetensors | ~7.7 GB | LoRA training, evaluation baselines, re-quantization |
+| `transformer_fp8_static/` | float8_e4m3fn | torchao `.pt` | ~3.9 GB | Production inference (~2x memory saving) |
+### bf16
+Lossless dequantization of BFL's FP8 weights (bf16 can represent all float8_e4m3fn
+values exactly). This is the recommended starting point for fine-tuning or LoRA
+training — the weights are numerically identical to BFL's original FP8 model.
+### FP8 static (torchao)
+Both weights **and** activations are quantized to float8_e4m3fn. Activation scales
+are the original per-layer `input_scale` values from BFL's calibration. The checkpoint
+is a `torch.save` dict containing:
+- `state_dict` — torchao `AffineQuantizedTensor` weights
+- `act_scales` — per-Linear static activation scales (float32)
+- `fp8_dtype` — `"float8_e4m3fn"`
+## Usage — bf16
+```python
+from diffusers import Flux2Transformer2DModel, Flux2KleinPipeline
+from PIL import Image
+import torch
+# Load transformer (bf16)
+transformer = Flux2Transformer2DModel.from_pretrained(
+    "photoroom/FLUX.2-klein-4b-fp8-diffusers",
+    subfolder="transformer_bf16",
+    torch_dtype=torch.bfloat16,
+).to("cuda")
+# Load pipeline (text encoders, VAE, scheduler from BFL)
+pipe = Flux2KleinPipeline.from_pretrained(
+    "black-forest-labs/FLUX.2-klein-4B",
+    transformer=transformer,
+    torch_dtype=torch.bfloat16,
+)
+# Run inference
+image = Image.open("input.png").convert("RGB")
+result = pipe(
+    prompt="a product on a marble countertop",
+    image=[image],
+    height=1024,
+    width=1024,
+    guidance_scale=1.0,
+    num_inference_steps=4,
+    generator=torch.Generator(device="cuda").manual_seed(42),
+).images[0]
+result.save("output.png")
+```
+## Usage — FP8 static
+```python
+from diffusers import Flux2Transformer2DModel, Flux2KleinPipeline
+from huggingface_hub import hf_hub_download
+from load_torchao import load_torchao_fp8_static_model
+from PIL import Image
+import torch
+# Load FP8 static transformer
+ckpt_path = hf_hub_download(
+    "photoroom/FLUX.2-klein-4b-fp8-diffusers",
+    filename="transformer_fp8_static/model_fp8_static.pt",
+)
+transformer = load_torchao_fp8_static_model(
+    ckpt_path=ckpt_path,
+    base_model_or_factory=lambda: Flux2Transformer2DModel.from_pretrained(
+        "photoroom/FLUX.2-klein-4b-fp8-diffusers",
+        subfolder="transformer_bf16",
+        torch_dtype=torch.bfloat16,
+    ),
+    device="cuda",
+)
+# Load pipeline (text encoders, VAE, scheduler from BFL)
+pipe = Flux2KleinPipeline.from_pretrained(
+    "black-forest-labs/FLUX.2-klein-4B",
+    transformer=transformer,
+    torch_dtype=torch.bfloat16,
+)
+# Run inference
+# image = Image.open("input.png").convert("RGB")
+result = pipe(
+    prompt="a cat holding a frame with FP8 writing on it",
+    image=[None],
+    height=1024,
+    width=1024,
+    guidance_scale=1.0,
+    num_inference_steps=4,
+    generator=torch.Generator(device="cuda").manual_seed(42),
+).images[0]
+result.save("output.png")
+```
+## Quality comparison — Original bf16 vs Dequantized bf16 vs FP8 static
+Side-by-side text-to-image comparison at 1024x1024, 4 steps, guidance_scale=1.0.
+Prompts are chosen to stress fine details, textures, gradients, and high-frequency patterns.
+Each column shows: **Original BFL bf16** | **Dequantized bf16** (this repo) | **FP8 static** (this repo).
+<img src='grid.png' width='3088'>
+<details>
+<summary>Prompts used</summary>
+1. **Fine text + wood grain** — _"A close-up photograph of a vintage wooden sign that reads 'OPEN DAILY 9AM-6PM' in hand-painted white serif letters on a dark green background, peeling paint revealing wood grain underneath, tiny rusty nail heads, cobwebs in the corner, shot with a macro lens"_
+2. **High-frequency fabric** — _"Flat lay photograph of a neatly folded black and white houndstooth wool blazer next to a herringbone tweed scarf on a clean white marble surface with fine grey veining, visible individual wool fibers, top-down view, 8K product photography"_
+4. **Gradients + caustics** — _"A single chrome sphere resting on a wet black surface reflecting a sunset sky gradient from deep orange to violet, tiny water droplets scattered around it catching light as caustic sparkles, distant city skyline reflected in the sphere, photorealistic 8K"_
+5. **Grass + nature macro** — _"Extreme close-up of a freshly mowed lawn with individual grass blades in sharp focus, morning dew droplets on each blade refracting light into tiny rainbows, a small ladybug crawling on one blade, scattered clover leaves with visible vein patterns, macro photography, f/2.8 bokeh in the background"_
+6. **Architecture detail** — _"Aerial photograph of a Baroque cathedral rooftop showing hundreds of individual terracotta roof tiles, ornate stone gargoyles with weathered faces, tiny stained glass windows with visible lead cames, moss growing between cracks, pigeons perched on ledges, ultra detailed 8K drone photography"_
+</details>
+## License
+This model is a derivative of FLUX.2-klein-4B and is subject to the
+[FLUX.2-klein-4B license](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/blob/main/LICENSE).