Update README.md
Browse files
README.md
CHANGED
|
@@ -21,11 +21,7 @@ tags:
|
|
| 21 |
> This release provides **Diffusers-compatible transformer weights only**, enabling reduced memory usage and improved throughput while preserving the high-fidelity instruction-based image editing capabilities of the original model.
|
| 22 |
|
| 23 |
> [!important]
|
| 24 |
-
> This release compresses **only the diffusion transformer module** (`QwenImageTransformer2DModel`) using **FP8 (F8_E4M3) weight quantization with BF16 compute fallback**. The VAE, scheduler, text encoders, and other pipeline components remain unchanged and must be loaded from the base model.
|
| 25 |
-
>
|
| 26 |
-
> Quantization format follows the **FP8 W8A8** scheme (FP8 weights with dynamic FP8 activations where supported by hardware).
|
| 27 |
-
> FP8 W8A8 reference: [https://docs.vllm.ai/en/stable/features/quantization/fp8/](https://docs.vllm.ai/en/stable/features/quantization/fp8/)
|
| 28 |
-
> Quantization recipe: [https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8)
|
| 29 |
|
| 30 |
## Diffusers Usage
|
| 31 |
|
|
|
|
| 21 |
> This release provides **Diffusers-compatible transformer weights only**, enabling reduced memory usage and improved throughput while preserving the high-fidelity instruction-based image editing capabilities of the original model.
|
| 22 |
|
| 23 |
> [!important]
|
| 24 |
+
> This release compresses **only the diffusion transformer module** (`QwenImageTransformer2DModel`) using **FP8 (F8_E4M3) weight quantization with BF16 compute fallback**. The VAE, scheduler, text encoders, and other pipeline components remain unchanged and must be loaded from the base model. FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
## Diffusers Usage
|
| 27 |
|