Int8-Tensorwise Quantized model of ANIMA

!! You need custom node for running this model on ComfyUI. See below !!

Generation Speed

  • Tested on
    • RTX5090 (400W), ComfyUI with torch2.10.0+cu130
    • RTX3090 (280W), ComfyUI with torch2.9.1+cu130
    • RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
  • Generates 832x1216, cfg 4.0, steps 30, er sde, simple
  • Torch Compile + Sage attention (Both from KJNodes)
    • RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)
    • INT8Rowwise runs without Torch compile. (doesn't support it)
  • Second run measured
GPU BF16 FP8 INT8 Int8Rowwise INT8 vs BF16 (%)
5090 6.30it/s (5.04s) 7.20it/s (4.86s) 8.46it/s (4.24s) 6.20it/s (5.36s) +18.8%
3090 1.70it/s (19.36s) 1.55it/s (20.26s) 2.58it/s (12.62s) 1.79it/s (18.04s) +53.3%
3060 1.06it/s (29.47s) 1.07it/s (28.91s) 1.33it/s (23.43s) 1.06it/s (28.91s) +25.7%

Sample

quant sample
BF16 bf16
FP8 fp8
INT8 int8_3
INT8Rowwise int8rowwise

How to use

  1. Cloning ComfyUI-Flux2-INT8 to custom_nodes directory.
  2. Use Load Diffusion Model INT8 (W8A8) node to model loading and set on_the_fly_qunatization to False (default). image

Quantized layers

INT8Tensorwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
    { "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}

INT8Rowwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0.", "adaln_modulation", ".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"] },
    { "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}
Downloads last month
83
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Bedovyy/Anima-INT8

Quantized
(6)
this model