AmdGoose commited on
Commit
b8eb85a
Β·
1 Parent(s): 59bb79a

Update README documentation

Browse files
Files changed (2) hide show
  1. .gitattributes +0 -34
  2. README.md +42 -36
.gitattributes CHANGED
@@ -1,35 +1 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
1
  *.bin filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: other
 
 
3
  tags:
4
  - diffusers
5
  - image-generation
@@ -9,85 +11,89 @@ tags:
9
  - amd
10
  - rocm
11
  base_model: black-forest-labs/FLUX.2-dev
12
- library_name: diffusers
13
- pipeline_tag: text-to-image
14
  ---
15
- # FLUX.2-dev – Transformer INT8 Weight-Only (torchao)
 
16
 
17
  This repository provides an **INT8 weight-only quantized transformer** for
18
  [`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev).
19
 
20
- Only the **transformer** is quantized and redistributed.
21
- All other components (VAE, text encoders, scheduler, etc.) are loaded from the original model.
22
 
23
- ---
 
 
24
 
25
- ## What is included
 
26
 
27
- - βœ… INT8 weight-only quantized **transformer**
28
- - ❌ No VAE
29
- - ❌ No text encoders
30
- - ❌ No scheduler
31
 
32
- Quantization is performed using **torchao** (INT8 weight-only).
 
 
33
 
34
  ---
35
 
36
- ## Why this exists
 
 
 
 
37
 
38
- - Reduce VRAM usage of FLUX.2-dev
39
- - Keep compatibility with Diffusers pipelines
40
- - Avoid bitsandbytes (not supported on ROCm)
41
- - Enable deployment on AMD GPUs (MI200 / MI210 / MI300)
42
 
43
  ---
44
 
45
- ## Requirements
46
 
47
- - PyTorch with CUDA or ROCm
48
- - `diffusers` (git main recommended)
49
- - `torchao`
50
- - `transformers`
51
- - `huggingface-hub`
52
 
53
- > ⚠️ The quantized transformer **cannot be loaded with safetensors**.
 
 
 
54
 
55
  ---
56
 
57
- ## How to use
58
 
59
  ```python
60
  import torch
61
  from diffusers import Flux2Pipeline, AutoModel
62
 
63
  BASE_MODEL = "black-forest-labs/FLUX.2-dev"
64
- INT8_REPO = "Atech/FLUX.2-dev-transformer-int8wo"
65
 
66
  dtype = torch.bfloat16
 
67
 
68
- # Load INT8 transformer
69
  transformer = AutoModel.from_pretrained(
70
- INT8_REPO,
71
- subfolder="transformer_int8wo",
72
  torch_dtype=dtype,
73
  use_safetensors=False,
74
- )
75
 
76
- # Build pipeline using original FLUX.2-dev
77
  pipe = Flux2Pipeline.from_pretrained(
78
  BASE_MODEL,
79
  transformer=transformer,
80
  torch_dtype=dtype,
81
- device_map="balanced", # recommended
82
  )
83
 
84
- # Example generation
 
 
 
85
  image = pipe(
86
- prompt="A futuristic data center server rack",
87
- num_inference_steps=35,
88
  guidance_scale=4,
89
  height=1024,
90
  width=1024,
91
  ).images[0]
92
 
93
- image.save("output.png")
 
1
  ---
2
  license: other
3
+ library_name: diffusers
4
+ pipeline_tag: text-to-image
5
  tags:
6
  - diffusers
7
  - image-generation
 
11
  - amd
12
  - rocm
13
  base_model: black-forest-labs/FLUX.2-dev
 
 
14
  ---
15
+
16
+ # FLUX.2-dev β€” Attention-only INT8 Weight-Only Transformer (ROCm)
17
 
18
  This repository provides an **INT8 weight-only quantized transformer** for
19
  [`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev).
20
 
21
+ It is designed to be:
 
22
 
23
+ - βœ… **ROCm-compatible**
24
+ - βœ… **Stable on AMD Instinct MI210**
25
+ - βœ… **Image-quality preserving**
26
 
27
+ Only **attention Linear layers (Q/K/V + projections)** are quantized.
28
+ All other components remain in **BF16**.
29
 
30
+ ---
31
+
32
+ ## πŸ” What is included
 
33
 
34
+ - βœ… Transformer with **attention-only INT8 weight-only quantization**
35
+ - βœ… TorchAO-based quantization (no bitsandbytes)
36
+ - βœ… Compatible with **Diffusers standard pipelines**
37
 
38
  ---
39
 
40
+ ## ❌ What is NOT included
41
+
42
+ - ❌ VAE
43
+ - ❌ Text encoders
44
+ - ❌ Scheduler
45
 
46
+ These components are automatically loaded from the base FLUX.2 model.
 
 
 
47
 
48
  ---
49
 
50
+ ## πŸ’‘ Why attention-only INT8?
51
 
52
+ Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm.
53
+ Quantizing **only attention layers** provides:
 
 
 
54
 
55
+ - Significant VRAM reduction
56
+ - Stable generation
57
+ - No "confetti noise" artifacts
58
+ - Safe inference on MI210 (64 GB)
59
 
60
  ---
61
 
62
+ ## πŸš€ Usage (Diffusers)
63
 
64
  ```python
65
  import torch
66
  from diffusers import Flux2Pipeline, AutoModel
67
 
68
  BASE_MODEL = "black-forest-labs/FLUX.2-dev"
69
+ ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo"
70
 
71
  dtype = torch.bfloat16
72
+ device = "cuda" # ROCm uses "cuda" in PyTorch
73
 
 
74
  transformer = AutoModel.from_pretrained(
75
+ ATTN_INT8,
76
+ subfolder="transformer_attn_int8wo",
77
  torch_dtype=dtype,
78
  use_safetensors=False,
79
+ ).to(device)
80
 
 
81
  pipe = Flux2Pipeline.from_pretrained(
82
  BASE_MODEL,
83
  transformer=transformer,
84
  torch_dtype=dtype,
 
85
  )
86
 
87
+ pipe.enable_attention_slicing()
88
+ pipe.vae.enable_tiling()
89
+ pipe.enable_model_cpu_offload()
90
+
91
  image = pipe(
92
+ prompt="A realistic starter pack figurine in a blister box, studio lighting",
93
+ num_inference_steps=28,
94
  guidance_scale=4,
95
  height=1024,
96
  width=1024,
97
  ).images[0]
98
 
99
+ image.save("out.png")