INT8 Quantized model of FLUX.2-klein-4B for ComfyUI

Generation speed

  • Tested on
    • RTX5090 (400W), ComfyUI with torch2.10.0+cu130
    • RTX3090 (280W), ComfyUI with torch2.9.1+cu130
    • RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
  • Generates 1216x1856, 4steps, cfg 1.0
  • Second~Fifth run measured by varying only the seed
GPU Run Type FP8 it/s FP8 Time (s) INT8 it/s INT8 Time (s) Relative Speedup (%)
RTX5090 First 1.93 6.67 2.70 14.57 -118%
RTX5090 2โ€“5 avg 2.21 3.25 1.75 3.20 +3%
RTX3090 First 0.46 16.09 0.21 23.30 -45%
RTX3090 2โ€“5 avg 0.48 11.98 0.58 7.76 +35%
RTX3060 First 0.17 78.03 0.11 84.38 -8%
RTX3060 2โ€“5 avg 0.20 38.96 0.25 28.42 +27%

Result

Mode fp8 int8 bf16
T2I T2I_fp8 T2I_int8 T2I_bf16
EDIT edit_fp8 edit_int8 edit_bf16

How to run

  1. pull https://github.com/BobJohnson24/ComfyUI-Flux2-INT8 on ComfyUI/custom_nodes.
  2. run ComfyUI and search INT8.
  3. Use the Load Diffusion Model INT8 (W8A8) node for model loading.
  4. Use the Load LoRA INT8 nodes for Lora loading.

How to reproduce

  1. install https://github.com/silveroxides/convert_to_quant
pip install convert_to_quant
  1. quantize using bf16 model.
convert_to_quant -i models/flux-2-klein-4b.safetensors --int8 --block_size 128 --comfy_quant --flux2 --scaling_mode tensor
  1. model will be saved as models/flux-2-klein-4b_learned_int8mixed_tensorwise.safetensors
Downloads last month
121
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Bedovyy/FLUX.2-klein-4B-INT8-Comfy

Quantized
(13)
this model