INT8 Quantized model of FLUX.2-klein-4B for ComfyUI
Generation speed
- Tested on
- RTX5090 (400W), ComfyUI with torch2.10.0+cu130
- RTX3090 (280W), ComfyUI with torch2.9.1+cu130
- RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
- Generates 1216x1856, 4steps, cfg 1.0
- Second~Fifth run measured by varying only the seed
| GPU | Run Type | FP8 it/s | FP8 Time (s) | INT8 it/s | INT8 Time (s) | Relative Speedup (%) |
|---|---|---|---|---|---|---|
| RTX5090 | First | 1.93 | 6.67 | 2.70 | 14.57 | -118% |
| RTX5090 | 2โ5 avg | 2.21 | 3.25 | 1.75 | 3.20 | +3% |
| RTX3090 | First | 0.46 | 16.09 | 0.21 | 23.30 | -45% |
| RTX3090 | 2โ5 avg | 0.48 | 11.98 | 0.58 | 7.76 | +35% |
| RTX3060 | First | 0.17 | 78.03 | 0.11 | 84.38 | -8% |
| RTX3060 | 2โ5 avg | 0.20 | 38.96 | 0.25 | 28.42 | +27% |
Result
How to run
- pull https://github.com/BobJohnson24/ComfyUI-Flux2-INT8 on
ComfyUI/custom_nodes. - run ComfyUI and search
INT8. - Use the
Load Diffusion Model INT8 (W8A8)node for model loading. - Use the
Load LoRA INT8nodes for Lora loading.
How to reproduce
pip install convert_to_quant
- quantize using bf16 model.
convert_to_quant -i models/flux-2-klein-4b.safetensors --int8 --block_size 128 --comfy_quant --flux2 --scaling_mode tensor
- model will be saved as
models/flux-2-klein-4b_learned_int8mixed_tensorwise.safetensors
- Downloads last month
- 121
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for Bedovyy/FLUX.2-klein-4B-INT8-Comfy
Base model
black-forest-labs/FLUX.2-klein-4B




