Int8-Tensorwise Quantized model of ANIMA

!! You need custom node for running this model on ComfyUI. See below !!

Generation Speed

  • Tested on
    • RTX5090 (400W), ComfyUI with torch2.10.0+cu130
    • RTX3090 (280W), ComfyUI with torch2.9.1+cu130
    • RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
  • Generates 832x1216, cfg 4.0, steps 30, er sde, simple
  • Torch Compile + Sage attention (Both from KJNodes)
    • RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)
  • Second run measured
GPU BF16 FP8 INT8 Speed vs BF16 (%)
5090 6.30it/s (5.04s) 7.20it/s (4.86s) 8.46it/s (4.24s) +18.8%
3090 1.70it/s (19.36s) 1.55it/s (20.26s) 2.58it/s (12.62s) +53.3%
3060 1.06it/s (29.47s) 1.07it/s (28.91s) 1.33it/s (23.43s) +25.7%

Sample

quant sample
BF16 bf16
FP8 fp8
INT8 int8_3

How to use

  1. Cloning ComfyUI-Flux2-INT8 to custom_nodes directory.
  2. Modify int8_unet_loader.py as below.
    diff --git a/int8_unet_loader.py b/int8_unet_loader.py
    index 5fee67a..f5529fd 100644
    --- a/int8_unet_loader.py
    +++ b/int8_unet_loader.py
    @@ -21,7 +21,7 @@ class UNetLoaderINTW8A8:
                 "required": {
                     "unet_name": (folder_paths.get_filename_list("diffusion_models"),),
                     "weight_dtype": (["default", "fp8_e4m3fn", "fp16", "bf16"],),
    -                "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen"],),
    +                "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen", "prequantized"],),
                 }
             }
    
    @@ -77,6 +77,8 @@ class UNetLoaderINTW8A8:
                     'audio_scale_shift_table', 'av_ca_a2v_gate_adaln_single', 'av_ca_audio_scale_shift_adaln_single', 'av_ca_v2a_gate_adaln_single',
                     'av_ca_video_scale_shift_adaln_single', 'caption_projection', 'patchify_proj', 'proj_out', 'scale_shift_table',
                 ]
    +        elif model_type == "prequantized":
    +            Int8TensorwiseOps.excluded_names = ["."]
                 #print(f"Applying model-specific exclusions to Int8TensorwiseOps: {Int8TensorwiseOps.excluded_names}")
    
             # Load model directly - Int8TensorwiseOps handles int8 weights natively
    
  3. Use Load Diffusion Model INT8 (W8A8) node to model loading and set model_type to prequantized. image

Quantized layers

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
    { "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bedovyy/Anima-INT8-Tensorwise

Quantized
(6)
this model