File size: 2,170 Bytes
1c4de54
 
b8eb85a
 
1c4de54
 
 
 
 
 
 
 
 
 
b8eb85a
 
d98adcb
 
 
 
b8eb85a
d98adcb
b8eb85a
 
 
d98adcb
b8eb85a
 
d98adcb
b8eb85a
 
 
d98adcb
b8eb85a
 
 
d98adcb
 
 
b8eb85a
 
 
 
 
d98adcb
b8eb85a
d98adcb
 
 
b8eb85a
d98adcb
b8eb85a
 
d98adcb
b8eb85a
 
 
 
d98adcb
 
 
b8eb85a
d98adcb
 
 
 
 
 
b8eb85a
d98adcb
 
b8eb85a
d98adcb
 
b8eb85a
 
d98adcb
 
b8eb85a
d98adcb
 
 
 
 
 
 
b8eb85a
 
 
 
d98adcb
b8eb85a
 
d98adcb
 
 
 
 
b8eb85a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: other
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - diffusers
  - image-generation
  - quantization
  - int8
  - torchao
  - amd
  - rocm
base_model: black-forest-labs/FLUX.2-dev
---

# FLUX.2-dev β€” Attention-only INT8 Weight-Only Transformer (ROCm)

This repository provides an **INT8 weight-only quantized transformer** for  
[`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev).

It is designed to be:

- βœ… **ROCm-compatible**
- βœ… **Stable on AMD Instinct MI210**
- βœ… **Image-quality preserving**

Only **attention Linear layers (Q/K/V + projections)** are quantized.
All other components remain in **BF16**.

---

## πŸ” What is included

- βœ… Transformer with **attention-only INT8 weight-only quantization**
- βœ… TorchAO-based quantization (no bitsandbytes)
- βœ… Compatible with **Diffusers standard pipelines**

---

## ❌ What is NOT included

- ❌ VAE
- ❌ Text encoders
- ❌ Scheduler

These components are automatically loaded from the base FLUX.2 model.

---

## πŸ’‘ Why attention-only INT8?

Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm.
Quantizing **only attention layers** provides:

- Significant VRAM reduction
- Stable generation
- No "confetti noise" artifacts
- Safe inference on MI210 (64 GB)

---

## πŸš€ Usage (Diffusers)

```python
import torch
from diffusers import Flux2Pipeline, AutoModel

BASE_MODEL = "black-forest-labs/FLUX.2-dev"
ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo"

dtype = torch.bfloat16
device = "cuda"  # ROCm uses "cuda" in PyTorch

transformer = AutoModel.from_pretrained(
    ATTN_INT8,
    subfolder="transformer_attn_int8wo",
    torch_dtype=dtype,
    use_safetensors=False,
).to(device)

pipe = Flux2Pipeline.from_pretrained(
    BASE_MODEL,
    transformer=transformer,
    torch_dtype=dtype,
)

pipe.enable_attention_slicing()
pipe.vae.enable_tiling()
pipe.enable_model_cpu_offload()

image = pipe(
    prompt="A realistic starter pack figurine in a blister box, studio lighting",
    num_inference_steps=28,
    guidance_scale=4,
    height=1024,
    width=1024,
).images[0]

image.save("out.png")