--- license: other library_name: diffusers pipeline_tag: text-to-image tags: - diffusers - image-generation - quantization - int8 - torchao - amd - rocm base_model: black-forest-labs/FLUX.2-dev --- # FLUX.2-dev — Attention-only INT8 Weight-Only Transformer (ROCm) This repository provides an **INT8 weight-only quantized transformer** for [`black-forest-labs/FLUX.2-dev`](https://huggingface.co/black-forest-labs/FLUX.2-dev). It is designed to be: - ✅ **ROCm-compatible** - ✅ **Stable on AMD Instinct MI210** - ✅ **Image-quality preserving** Only **attention Linear layers (Q/K/V + projections)** are quantized. All other components remain in **BF16**. --- ## 🔍 What is included - ✅ Transformer with **attention-only INT8 weight-only quantization** - ✅ TorchAO-based quantization (no bitsandbytes) - ✅ Compatible with **Diffusers standard pipelines** --- ## ❌ What is NOT included - ❌ VAE - ❌ Text encoders - ❌ Scheduler These components are automatically loaded from the base FLUX.2 model. --- ## 💡 Why attention-only INT8? Full INT8 quantization of FLUX.2 introduces visible artifacts on ROCm. Quantizing **only attention layers** provides: - Significant VRAM reduction - Stable generation - No "confetti noise" artifacts - Safe inference on MI210 (64 GB) --- ## 🚀 Usage (Diffusers) ```python import torch from diffusers import Flux2Pipeline, AutoModel BASE_MODEL = "black-forest-labs/FLUX.2-dev" ATTN_INT8 = "AmdGoose/FLUX.2-dev-transformer-attn-int8wo" dtype = torch.bfloat16 device = "cuda" # ROCm uses "cuda" in PyTorch transformer = AutoModel.from_pretrained( ATTN_INT8, subfolder="transformer_attn_int8wo", torch_dtype=dtype, use_safetensors=False, ).to(device) pipe = Flux2Pipeline.from_pretrained( BASE_MODEL, transformer=transformer, torch_dtype=dtype, ) pipe.enable_attention_slicing() pipe.vae.enable_tiling() pipe.enable_model_cpu_offload() image = pipe( prompt="A realistic starter pack figurine in a blister box, studio lighting", num_inference_steps=28, guidance_scale=4, height=1024, width=1024, ).images[0] image.save("out.png")