FLUX.2-dev INT8 W8A8 ConvRot Quant

INT8 W8A8 ConvRot quantization of black-forest-labs/FLUX.2-dev, packaged for use with ComfyUI-INT8-Fast.

It is a quantized version of the original FLUX.2-dev weights intended to reduce VRAM use and improve inference speed.

Tests

Got roughly 2x faster it/s with an RTX 4090.

it/s with an RTX 3090, pcie 4x16, 64GB DDR4 3600MHz RAM, ComfyUI in WSL2 docker using fp8 text encoder.

bf16 not tested yet with RTX 3090.
Image Generation - 1024x1024 Output
- Pi-Flow 4 step adapter, no torch-compile: 2.7s/it
Image Edit: 1024x1024 input & output
- Pi-Flow 4 step adapter, no torch-compile:
  - cold start 1st run: 9.7s/it
  - warm: 6.4s/it
- sage-attention 2.2.0 brings it to 5.5s/it, but quality degrades so didn't test further.
When using Torch compile, got no difference and actually caused degredation in speed when using native torchCompileModel node.

Compared to Flux2TurboComfyv2, Pi-Flow is noticebely better quality despite only using 4 steps.

Atleast with this setup, main speed limitation is the text encoder. With windows + WSL2 + docker ram overhead, sometimes end up falling back to swap memory. This could also be an issue due to my adaption of BobJohnson24's INT8 model loader to handle Pi-Flow. Could be alleviated by using int4/gguf for text encoder, but quality loss is noticeable. A remote text encoder is more suitable.

To summarize the above:

Only change seed or input image = ~20s/image.
Change the prompt = ~100s/image

Model Details

Base model: black-forest-labs/FLUX.2-dev
Quantization: INT8 W8A8
Rotation method: ConvRot
Target runtime: ComfyUI with ComfyUI-INT8-Fast
Model type: Rectified flow transformer image generation / editing model
License: FLUX Non-Commercial License, inherited from FLUX.2-dev

Intended Use

Use this checkpoint in ComfyUI through the ComfyUI-INT8-Fast custom node. Or I can share the INT8 pi-flow loader if requested. Also tested working with Flux2TurboComfyv2 low step lora from here with the "pre_lora" loader.

How to Use

Install ComfyUI.
Install triton and ComfyKitchen (this model was tested and working with cuda128)
Install the custom node:
- BobJohnson24/ComfyUI-INT8-Fast
Download this checkpoint from the Hugging Face repository.
Download text-encoder and vae from here.
Place the model files in the text_encoders, vae and diffusion_models subfolders expected by your ComfyUI setup.
Load it with the INT8 model loader node from ComfyUI-INT8-Fast.

Refer to the custom node repository for current installation requirements and workflow examples.

License and Use Restrictions

This quantization is derived from black-forest-labs/FLUX.2-dev and follows the same FLUX Non-Commercial License terms. Also see 'here' for the license that BobJohnson24/ComfyUI-INT8-Fast falls under.

Users are responsible for complying with the original FLUX.2-dev license, acceptable use policy, and any additional restrictions from Black Forest Labs, ComfyUI and ComfyUI-INT8-Fast.

Downloads last month: 10

Model tree for solphor/Flux2-Dev-INT8-W8A8-Convrot-Model

Base model

black-forest-labs/FLUX.2-dev

Quantized

(15)

this model