File size: 2,170 Bytes
1c4de54 b8eb85a 1c4de54 b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a d98adcb b8eb85a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
---
license: other
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- image-generation
- quantization
- int8
- torchao
- amd
- rocm
base_model: black-forest-labs/FLUX.2-dev
---
# FLUX.2-dev β Attention-only INT8 Weight-Only Transformer (ROCm)
This repository provides an **INT8 weight-only quantized transformer** for
[`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev).
It is designed to be:
- β
**ROCm-compatible**
- β
**Stable on AMD Instinct MI210**
- β
**Image-quality preserving**
Only **attention Linear layers (Q/K/V + projections)** are quantized.
All other components remain in **BF16**.
---
## π What is included
- β
Transformer with **attention-only INT8 weight-only quantization**
- β
TorchAO-based quantization (no bitsandbytes)
- β
Compatible with **Diffusers standard pipelines**
---
## β What is NOT included
- β VAE
- β Text encoders
- β Scheduler
These components are automatically loaded from the base FLUX.2 model.
---
## π‘ Why attention-only INT8?
Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm.
Quantizing **only attention layers** provides:
- Significant VRAM reduction
- Stable generation
- No "confetti noise" artifacts
- Safe inference on MI210 (64 GB)
---
## π Usage (Diffusers)
```python
import torch
from diffusers import Flux2Pipeline, AutoModel
BASE_MODEL = "black-forest-labs/FLUX.2-dev"
ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo"
dtype = torch.bfloat16
device = "cuda" # ROCm uses "cuda" in PyTorch
transformer = AutoModel.from_pretrained(
ATTN_INT8,
subfolder="transformer_attn_int8wo",
torch_dtype=dtype,
use_safetensors=False,
).to(device)
pipe = Flux2Pipeline.from_pretrained(
BASE_MODEL,
transformer=transformer,
torch_dtype=dtype,
)
pipe.enable_attention_slicing()
pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()
image = pipe(
prompt="A realistic starter pack figurine in a blister box, studio lighting",
num_inference_steps=28,
guidance_scale=4,
height=1024,
width=1024,
).images[0]
image.save("out.png")
|