AmdGoose
/

FLUX.2-dev-transformer-int8wo

image-generation

Model card Files Files and versions

FLUX.2-dev-transformer-int8wo / README.md

AmdGoose's picture

Update README documentation

b8eb85a 11 days ago

|

history blame contribute delete

2.17 kB

	---
	license: other
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- diffusers
	- image-generation
	- quantization
	- int8
	- torchao
	- amd
	- rocm
	base_model: black-forest-labs/FLUX.2-dev
	---

	# FLUX.2-dev — Attention-only INT8 Weight-Only Transformer (ROCm)

	This repository provides an INT8 weight-only quantized transformer for
	[`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev).

	It is designed to be:

	- ✅ ROCm-compatible
	- ✅ Stable on AMD Instinct MI210
	- ✅ Image-quality preserving

	Only attention Linear layers (Q/K/V + projections) are quantized.
	All other components remain in BF16.

	---

	## 🔍 What is included

	- ✅ Transformer with attention-only INT8 weight-only quantization
	- ✅ TorchAO-based quantization (no bitsandbytes)
	- ✅ Compatible with Diffusers standard pipelines

	---

	## ❌ What is NOT included

	- ❌ VAE
	- ❌ Text encoders
	- ❌ Scheduler

	These components are automatically loaded from the base FLUX.2 model.

	---

	## 💡 Why attention-only INT8?

	Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm.
	Quantizing only attention layers provides:

	- Significant VRAM reduction
	- Stable generation
	- No "confetti noise" artifacts
	- Safe inference on MI210 (64 GB)

	---

	## 🚀 Usage (Diffusers)

	```python
	import torch
	from diffusers import Flux2Pipeline, AutoModel

	BASE_MODEL = "black-forest-labs/FLUX.2-dev"
	ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo"

	dtype = torch.bfloat16
	device = "cuda" # ROCm uses "cuda" in PyTorch

	transformer = AutoModel.from_pretrained(
	ATTN_INT8,
	subfolder="transformer_attn_int8wo",
	torch_dtype=dtype,
	use_safetensors=False,
	).to(device)

	pipe = Flux2Pipeline.from_pretrained(
	BASE_MODEL,
	transformer=transformer,
	torch_dtype=dtype,
	)

	pipe.enable_attention_slicing()
	pipe.vae.enable_tiling()
	pipe.enable_model_cpu_offload()

	image = pipe(
	prompt="A realistic starter pack figurine in a blister box, studio lighting",
	num_inference_steps=28,
	guidance_scale=4,
	height=1024,
	width=1024,
	).images[0]

	image.save("out.png")