|
|
--- |
|
|
license: other |
|
|
library_name: diffusers |
|
|
pipeline_tag: text-to-image |
|
|
tags: |
|
|
- diffusers |
|
|
- image-generation |
|
|
- quantization |
|
|
- int8 |
|
|
- torchao |
|
|
- amd |
|
|
- rocm |
|
|
base_model: black-forest-labs/FLUX.2-dev |
|
|
--- |
|
|
|
|
|
# FLUX.2-dev β Attention-only INT8 Weight-Only Transformer (ROCm) |
|
|
|
|
|
This repository provides an **INT8 weight-only quantized transformer** for |
|
|
[`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev). |
|
|
|
|
|
It is designed to be: |
|
|
|
|
|
- β
**ROCm-compatible** |
|
|
- β
**Stable on AMD Instinct MI210** |
|
|
- β
**Image-quality preserving** |
|
|
|
|
|
Only **attention Linear layers (Q/K/V + projections)** are quantized. |
|
|
All other components remain in **BF16**. |
|
|
|
|
|
--- |
|
|
|
|
|
## π What is included |
|
|
|
|
|
- β
Transformer with **attention-only INT8 weight-only quantization** |
|
|
- β
TorchAO-based quantization (no bitsandbytes) |
|
|
- β
Compatible with **Diffusers standard pipelines** |
|
|
|
|
|
--- |
|
|
|
|
|
## β What is NOT included |
|
|
|
|
|
- β VAE |
|
|
- β Text encoders |
|
|
- β Scheduler |
|
|
|
|
|
These components are automatically loaded from the base FLUX.2 model. |
|
|
|
|
|
--- |
|
|
|
|
|
## π‘ Why attention-only INT8? |
|
|
|
|
|
Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm. |
|
|
Quantizing **only attention layers** provides: |
|
|
|
|
|
- Significant VRAM reduction |
|
|
- Stable generation |
|
|
- No "confetti noise" artifacts |
|
|
- Safe inference on MI210 (64 GB) |
|
|
|
|
|
--- |
|
|
|
|
|
## π Usage (Diffusers) |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import Flux2Pipeline, AutoModel |
|
|
|
|
|
BASE_MODEL = "black-forest-labs/FLUX.2-dev" |
|
|
ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo" |
|
|
|
|
|
dtype = torch.bfloat16 |
|
|
device = "cuda" # ROCm uses "cuda" in PyTorch |
|
|
|
|
|
transformer = AutoModel.from_pretrained( |
|
|
ATTN_INT8, |
|
|
subfolder="transformer_attn_int8wo", |
|
|
torch_dtype=dtype, |
|
|
use_safetensors=False, |
|
|
).to(device) |
|
|
|
|
|
pipe = Flux2Pipeline.from_pretrained( |
|
|
BASE_MODEL, |
|
|
transformer=transformer, |
|
|
torch_dtype=dtype, |
|
|
) |
|
|
|
|
|
pipe.enable_attention_slicing() |
|
|
pipe.vae.enable_tiling() |
|
|
pipe.enable_model_cpu_offload() |
|
|
|
|
|
image = pipe( |
|
|
prompt="A realistic starter pack figurine in a blister box, studio lighting", |
|
|
num_inference_steps=28, |
|
|
guidance_scale=4, |
|
|
height=1024, |
|
|
width=1024, |
|
|
).images[0] |
|
|
|
|
|
image.save("out.png") |
|
|
|