David-PHR commited on
Commit
408c457
Β·
verified Β·
1 Parent(s): 5281a7b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +156 -0
README.md CHANGED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: flux-2-klein-4b-agreement
4
+ license_link: https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/blob/main/LICENSE
5
+ library_name: diffusers
6
+ pipeline_tag: image-to-image
7
+ tags:
8
+ - flux2
9
+ - fp8
10
+ - torchao
11
+ - diffusers
12
+ - transformer
13
+ base_model: black-forest-labs/FLUX.2-klein-4B
14
+ ---
15
+
16
+ # FLUX.2-klein-4b FP8 β€” Diffusers Transformer
17
+
18
+ Diffusers-compatible **transformer-only** weights for
19
+ [FLUX.2-klein-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B),
20
+ converted from Black Forest Labs'
21
+ [FP8 checkpoint](https://huggingface.co/black-forest-labs/FLUX.2-klein-4b-fp8)
22
+ (ComfyUI format).
23
+
24
+ > **This repo does not contain the full pipeline.**
25
+ > Text encoders, VAE, and scheduler are loaded from
26
+ > [black-forest-labs/FLUX.2-klein-4B](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B).
27
+
28
+ ## Available variants
29
+
30
+ | Subfolder | Precision | Format | Size | Use case |
31
+ |---|---|---|---|---|
32
+ | `transformer_bf16/` | bfloat16 | safetensors | ~7.7 GB | LoRA training, evaluation baselines, re-quantization |
33
+ | `transformer_fp8_static/` | float8_e4m3fn | torchao `.pt` | ~3.9 GB | Production inference (~2x memory saving) |
34
+
35
+ ### bf16
36
+
37
+ Lossless dequantization of BFL's FP8 weights (bf16 can represent all float8_e4m3fn
38
+ values exactly). This is the recommended starting point for fine-tuning or LoRA
39
+ training β€” the weights are numerically identical to BFL's original FP8 model.
40
+
41
+ ### FP8 static (torchao)
42
+
43
+ Both weights **and** activations are quantized to float8_e4m3fn. Activation scales
44
+ are the original per-layer `input_scale` values from BFL's calibration. The checkpoint
45
+ is a `torch.save` dict containing:
46
+
47
+ - `state_dict` β€” torchao `AffineQuantizedTensor` weights
48
+ - `act_scales` β€” per-Linear static activation scales (float32)
49
+ - `fp8_dtype` β€” `"float8_e4m3fn"`
50
+
51
+ ## Usage β€” bf16
52
+
53
+ ```python
54
+ from diffusers import Flux2Transformer2DModel, Flux2KleinPipeline
55
+ from PIL import Image
56
+ import torch
57
+
58
+ # Load transformer (bf16)
59
+ transformer = Flux2Transformer2DModel.from_pretrained(
60
+ "photoroom/FLUX.2-klein-4b-fp8-diffusers",
61
+ subfolder="transformer_bf16",
62
+ torch_dtype=torch.bfloat16,
63
+ ).to("cuda")
64
+
65
+ # Load pipeline (text encoders, VAE, scheduler from BFL)
66
+ pipe = Flux2KleinPipeline.from_pretrained(
67
+ "black-forest-labs/FLUX.2-klein-4B",
68
+ transformer=transformer,
69
+ torch_dtype=torch.bfloat16,
70
+ )
71
+
72
+ # Run inference
73
+ image = Image.open("input.png").convert("RGB")
74
+ result = pipe(
75
+ prompt="a product on a marble countertop",
76
+ image=[image],
77
+ height=1024,
78
+ width=1024,
79
+ guidance_scale=1.0,
80
+ num_inference_steps=4,
81
+ generator=torch.Generator(device="cuda").manual_seed(42),
82
+ ).images[0]
83
+ result.save("output.png")
84
+ ```
85
+
86
+ ## Usage β€” FP8 static
87
+
88
+ ```python
89
+ from diffusers import Flux2Transformer2DModel, Flux2KleinPipeline
90
+ from huggingface_hub import hf_hub_download
91
+ from load_torchao import load_torchao_fp8_static_model
92
+ from PIL import Image
93
+ import torch
94
+
95
+ # Load FP8 static transformer
96
+ ckpt_path = hf_hub_download(
97
+ "photoroom/FLUX.2-klein-4b-fp8-diffusers",
98
+ filename="transformer_fp8_static/model_fp8_static.pt",
99
+ )
100
+
101
+ transformer = load_torchao_fp8_static_model(
102
+ ckpt_path=ckpt_path,
103
+ base_model_or_factory=lambda: Flux2Transformer2DModel.from_pretrained(
104
+ "photoroom/FLUX.2-klein-4b-fp8-diffusers",
105
+ subfolder="transformer_bf16",
106
+ torch_dtype=torch.bfloat16,
107
+ ),
108
+ device="cuda",
109
+ )
110
+
111
+ # Load pipeline (text encoders, VAE, scheduler from BFL)
112
+ pipe = Flux2KleinPipeline.from_pretrained(
113
+ "black-forest-labs/FLUX.2-klein-4B",
114
+ transformer=transformer,
115
+ torch_dtype=torch.bfloat16,
116
+ )
117
+
118
+ # Run inference
119
+ # image = Image.open("input.png").convert("RGB")
120
+ result = pipe(
121
+ prompt="a cat holding a frame with FP8 writing on it",
122
+ image=[None],
123
+ height=1024,
124
+ width=1024,
125
+ guidance_scale=1.0,
126
+ num_inference_steps=4,
127
+ generator=torch.Generator(device="cuda").manual_seed(42),
128
+ ).images[0]
129
+ result.save("output.png")
130
+ ```
131
+
132
+ ## Quality comparison β€” Original bf16 vs Dequantized bf16 vs FP8 static
133
+
134
+ Side-by-side text-to-image comparison at 1024x1024, 4 steps, guidance_scale=1.0.
135
+ Prompts are chosen to stress fine details, textures, gradients, and high-frequency patterns.
136
+
137
+ Each column shows: **Original BFL bf16** | **Dequantized bf16** (this repo) | **FP8 static** (this repo).
138
+
139
+
140
+ <img src='grid.png' width='3088'>
141
+
142
+ <details>
143
+ <summary>Prompts used</summary>
144
+
145
+ 1. **Fine text + wood grain** β€” _"A close-up photograph of a vintage wooden sign that reads 'OPEN DAILY 9AM-6PM' in hand-painted white serif letters on a dark green background, peeling paint revealing wood grain underneath, tiny rusty nail heads, cobwebs in the corner, shot with a macro lens"_
146
+ 2. **High-frequency fabric** β€” _"Flat lay photograph of a neatly folded black and white houndstooth wool blazer next to a herringbone tweed scarf on a clean white marble surface with fine grey veining, visible individual wool fibers, top-down view, 8K product photography"_
147
+ 4. **Gradients + caustics** β€” _"A single chrome sphere resting on a wet black surface reflecting a sunset sky gradient from deep orange to violet, tiny water droplets scattered around it catching light as caustic sparkles, distant city skyline reflected in the sphere, photorealistic 8K"_
148
+ 5. **Grass + nature macro** β€” _"Extreme close-up of a freshly mowed lawn with individual grass blades in sharp focus, morning dew droplets on each blade refracting light into tiny rainbows, a small ladybug crawling on one blade, scattered clover leaves with visible vein patterns, macro photography, f/2.8 bokeh in the background"_
149
+ 6. **Architecture detail** β€” _"Aerial photograph of a Baroque cathedral rooftop showing hundreds of individual terracotta roof tiles, ornate stone gargoyles with weathered faces, tiny stained glass windows with visible lead cames, moss growing between cracks, pigeons perched on ledges, ultra detailed 8K drone photography"_
150
+
151
+ </details>
152
+
153
+ ## License
154
+
155
+ This model is a derivative of FLUX.2-klein-4B and is subject to the
156
+ [FLUX.2-klein-4B license](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B/blob/main/LICENSE).