prithivMLmods commited on
Commit
c93bb1c
·
verified ·
1 Parent(s): cb7a36a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -5
README.md CHANGED
@@ -21,11 +21,7 @@ tags:
21
  > This release provides **Diffusers-compatible transformer weights only**, enabling reduced memory usage and improved throughput while preserving the high-fidelity instruction-based image editing capabilities of the original model.
22
 
23
  > [!important]
24
- > This release compresses **only the diffusion transformer module** (`QwenImageTransformer2DModel`) using **FP8 (F8_E4M3) weight quantization with BF16 compute fallback**. The VAE, scheduler, text encoders, and other pipeline components remain unchanged and must be loaded from the base model.
25
- >
26
- > Quantization format follows the **FP8 W8A8** scheme (FP8 weights with dynamic FP8 activations where supported by hardware).
27
- > FP8 W8A8 reference: [https://docs.vllm.ai/en/stable/features/quantization/fp8/](https://docs.vllm.ai/en/stable/features/quantization/fp8/)
28
- > Quantization recipe: [https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8)
29
 
30
  ## Diffusers Usage
31
 
 
21
  > This release provides **Diffusers-compatible transformer weights only**, enabling reduced memory usage and improved throughput while preserving the high-fidelity instruction-based image editing capabilities of the original model.
22
 
23
  > [!important]
24
+ > This release compresses **only the diffusion transformer module** (`QwenImageTransformer2DModel`) using **FP8 (F8_E4M3) weight quantization with BF16 compute fallback**. The VAE, scheduler, text encoders, and other pipeline components remain unchanged and must be loaded from the base model. FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).
 
 
 
 
25
 
26
  ## Diffusers Usage
27