Update README documentation
Browse files- .gitattributes +0 -34
- README.md +42 -36
.gitattributes
CHANGED
|
@@ -1,35 +1 @@
|
|
| 1 |
-
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
-
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
-
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
-
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
-
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
-
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
-
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
-
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
-
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
-
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
-
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
-
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
-
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
-
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
-
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
-
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
| 1 |
*.bin filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,5 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: other
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
- diffusers
|
| 5 |
- image-generation
|
|
@@ -9,85 +11,89 @@ tags:
|
|
| 9 |
- amd
|
| 10 |
- rocm
|
| 11 |
base_model: black-forest-labs/FLUX.2-dev
|
| 12 |
-
library_name: diffusers
|
| 13 |
-
pipeline_tag: text-to-image
|
| 14 |
---
|
| 15 |
-
|
|
|
|
| 16 |
|
| 17 |
This repository provides an **INT8 weight-only quantized transformer** for
|
| 18 |
[`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev).
|
| 19 |
|
| 20 |
-
|
| 21 |
-
All other components (VAE, text encoders, scheduler, etc.) are loaded from the original model.
|
| 22 |
|
| 23 |
-
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
- β No scheduler
|
| 31 |
|
| 32 |
-
|
|
|
|
|
|
|
| 33 |
|
| 34 |
---
|
| 35 |
|
| 36 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
-
- Keep compatibility with Diffusers pipelines
|
| 40 |
-
- Avoid bitsandbytes (not supported on ROCm)
|
| 41 |
-
- Enable deployment on AMD GPUs (MI200 / MI210 / MI300)
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
-
##
|
| 46 |
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
- `torchao`
|
| 50 |
-
- `transformers`
|
| 51 |
-
- `huggingface-hub`
|
| 52 |
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
---
|
| 56 |
|
| 57 |
-
##
|
| 58 |
|
| 59 |
```python
|
| 60 |
import torch
|
| 61 |
from diffusers import Flux2Pipeline, AutoModel
|
| 62 |
|
| 63 |
BASE_MODEL = "black-forest-labs/FLUX.2-dev"
|
| 64 |
-
|
| 65 |
|
| 66 |
dtype = torch.bfloat16
|
|
|
|
| 67 |
|
| 68 |
-
# Load INT8 transformer
|
| 69 |
transformer = AutoModel.from_pretrained(
|
| 70 |
-
|
| 71 |
-
subfolder="
|
| 72 |
torch_dtype=dtype,
|
| 73 |
use_safetensors=False,
|
| 74 |
-
)
|
| 75 |
|
| 76 |
-
# Build pipeline using original FLUX.2-dev
|
| 77 |
pipe = Flux2Pipeline.from_pretrained(
|
| 78 |
BASE_MODEL,
|
| 79 |
transformer=transformer,
|
| 80 |
torch_dtype=dtype,
|
| 81 |
-
device_map="balanced", # recommended
|
| 82 |
)
|
| 83 |
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
| 85 |
image = pipe(
|
| 86 |
-
prompt="A
|
| 87 |
-
num_inference_steps=
|
| 88 |
guidance_scale=4,
|
| 89 |
height=1024,
|
| 90 |
width=1024,
|
| 91 |
).images[0]
|
| 92 |
|
| 93 |
-
image.save("
|
|
|
|
| 1 |
---
|
| 2 |
license: other
|
| 3 |
+
library_name: diffusers
|
| 4 |
+
pipeline_tag: text-to-image
|
| 5 |
tags:
|
| 6 |
- diffusers
|
| 7 |
- image-generation
|
|
|
|
| 11 |
- amd
|
| 12 |
- rocm
|
| 13 |
base_model: black-forest-labs/FLUX.2-dev
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
+
|
| 16 |
+
# FLUX.2-dev β Attention-only INT8 Weight-Only Transformer (ROCm)
|
| 17 |
|
| 18 |
This repository provides an **INT8 weight-only quantized transformer** for
|
| 19 |
[`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev).
|
| 20 |
|
| 21 |
+
It is designed to be:
|
|
|
|
| 22 |
|
| 23 |
+
- β
**ROCm-compatible**
|
| 24 |
+
- β
**Stable on AMD Instinct MI210**
|
| 25 |
+
- β
**Image-quality preserving**
|
| 26 |
|
| 27 |
+
Only **attention Linear layers (Q/K/V + projections)** are quantized.
|
| 28 |
+
All other components remain in **BF16**.
|
| 29 |
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## π What is included
|
|
|
|
| 33 |
|
| 34 |
+
- β
Transformer with **attention-only INT8 weight-only quantization**
|
| 35 |
+
- β
TorchAO-based quantization (no bitsandbytes)
|
| 36 |
+
- β
Compatible with **Diffusers standard pipelines**
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
+
## β What is NOT included
|
| 41 |
+
|
| 42 |
+
- β VAE
|
| 43 |
+
- β Text encoders
|
| 44 |
+
- β Scheduler
|
| 45 |
|
| 46 |
+
These components are automatically loaded from the base FLUX.2 model.
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
---
|
| 49 |
|
| 50 |
+
## π‘ Why attention-only INT8?
|
| 51 |
|
| 52 |
+
Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm.
|
| 53 |
+
Quantizing **only attention layers** provides:
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
- Significant VRAM reduction
|
| 56 |
+
- Stable generation
|
| 57 |
+
- No "confetti noise" artifacts
|
| 58 |
+
- Safe inference on MI210 (64 GB)
|
| 59 |
|
| 60 |
---
|
| 61 |
|
| 62 |
+
## π Usage (Diffusers)
|
| 63 |
|
| 64 |
```python
|
| 65 |
import torch
|
| 66 |
from diffusers import Flux2Pipeline, AutoModel
|
| 67 |
|
| 68 |
BASE_MODEL = "black-forest-labs/FLUX.2-dev"
|
| 69 |
+
ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo"
|
| 70 |
|
| 71 |
dtype = torch.bfloat16
|
| 72 |
+
device = "cuda" # ROCm uses "cuda" in PyTorch
|
| 73 |
|
|
|
|
| 74 |
transformer = AutoModel.from_pretrained(
|
| 75 |
+
ATTN_INT8,
|
| 76 |
+
subfolder="transformer_attn_int8wo",
|
| 77 |
torch_dtype=dtype,
|
| 78 |
use_safetensors=False,
|
| 79 |
+
).to(device)
|
| 80 |
|
|
|
|
| 81 |
pipe = Flux2Pipeline.from_pretrained(
|
| 82 |
BASE_MODEL,
|
| 83 |
transformer=transformer,
|
| 84 |
torch_dtype=dtype,
|
|
|
|
| 85 |
)
|
| 86 |
|
| 87 |
+
pipe.enable_attention_slicing()
|
| 88 |
+
pipe.vae.enable_tiling()
|
| 89 |
+
pipe.enable_model_cpu_offload()
|
| 90 |
+
|
| 91 |
image = pipe(
|
| 92 |
+
prompt="A realistic starter pack figurine in a blister box, studio lighting",
|
| 93 |
+
num_inference_steps=28,
|
| 94 |
guidance_scale=4,
|
| 95 |
height=1024,
|
| 96 |
width=1024,
|
| 97 |
).images[0]
|
| 98 |
|
| 99 |
+
image.save("out.png")
|