prithivMLmods
/

FireRed-Image-Edit-1.0-FP8

text-generation-inference

compressed-tensors

Model card Files Files and versions

prithivMLmods commited on 10 days ago

Commit

c93bb1c

·

verified ·

1 Parent(s): cb7a36a

Update README.md

Files changed (1) hide show

README.md +1 -5

README.md CHANGED Viewed

@@ -21,11 +21,7 @@ tags:
 > This release provides **Diffusers-compatible transformer weights only**, enabling reduced memory usage and improved throughput while preserving the high-fidelity instruction-based image editing capabilities of the original model.
 > [!important]
-> This release compresses **only the diffusion transformer module** (`QwenImageTransformer2DModel`) using **FP8 (F8_E4M3) weight quantization with BF16 compute fallback**. The VAE, scheduler, text encoders, and other pipeline components remain unchanged and must be loaded from the base model.
->
-> Quantization format follows the **FP8 W8A8** scheme (FP8 weights with dynamic FP8 activations where supported by hardware).
-> FP8 W8A8 reference: [https://docs.vllm.ai/en/stable/features/quantization/fp8/](https://docs.vllm.ai/en/stable/features/quantization/fp8/)
-> Quantization recipe: [https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8)
 ## Diffusers Usage

 > This release provides **Diffusers-compatible transformer weights only**, enabling reduced memory usage and improved throughput while preserving the high-fidelity instruction-based image editing capabilities of the original model.
 > [!important]
+> This release compresses **only the diffusion transformer module** (`QwenImageTransformer2DModel`) using **FP8 (F8_E4M3) weight quantization with BF16 compute fallback**. The VAE, scheduler, text encoders, and other pipeline components remain unchanged and must be loaded from the base model. FP8 (8-bit floating point) weight and activation quantization using hardware acceleration on GPUs – [FP8 W8A8](https://docs.vllm.ai/en/stable/features/quantization/fp8/). Quantization W8A8 FP8-dynamic recipe – [examples](https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8).
 ## Diffusers Usage