INT8 Quantized model of FLUX.2-klein-4B for ComfyUI

Tested on
- RTX5090 (400W), ComfyUI with torch2.10.0+cu130
- RTX3090 (280W), ComfyUI with torch2.9.1+cu130
- RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
Generates 1216x1856, 4steps, cfg 1.0
Second~Fifth run measured by varying only the seed

GPU	Run Type	FP8 it/s	FP8 Time (s)	INT8 it/s	INT8 Time (s)	Relative Speedup (%)
RTX5090	First	1.93	6.67	2.70	14.57	-118%
RTX5090	2–5 avg	2.21	3.25	1.75	3.20	+3%
RTX3090	First	0.46	16.09	0.21	23.30	-45%
RTX3090	2–5 avg	0.48	11.98	0.58	7.76	+35%
RTX3060	First	0.17	78.03	0.11	84.38	-8%
RTX3060	2–5 avg	0.20	38.96	0.25	28.42	+27%

Mode	fp8	int8	bf16
T2I
EDIT

pull https://github.com/BobJohnson24/ComfyUI-Flux2-INT8 on ComfyUI/custom_nodes.
run ComfyUI and search INT8.
Use the Load Diffusion Model INT8 (W8A8) node for model loading.
Use the Load LoRA INT8 nodes for Lora loading.

pip install convert_to_quant

convert_to_quant -i models/flux-2-klein-4b.safetensors --int8 --block_size 128 --comfy_quant --flux2 --scaling_mode tensor

model will be saved as models/flux-2-klein-4b_learned_int8mixed_tensorwise.safetensors

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bedovyy/FLUX.2-klein-4B-INT8-Comfy

Base model

Quantized

(13)

this model