Int8 Quantized model of ANIMA

Notice

ComfyUI has native INT8 support as of commit 1a510f0, but it currently rejects the int8_rowwise format and offers no real speedup over BF16.
If you want to run an INT8 model using the Load Diffusion Model node, please use the int8convrot model.
(This model actually uses int8_rowwise with convrot, but it is labeled as int8_tensorwise to bypass ComfyUI's format restrictions.)

Generation Speed

Test Environment

ComfyUI commit 264b003
ComfyUI-INT8-Fast commit 7ff676c
use --fast fp16_accumulation fp8_matrix_mult cublas_ops --use-sage-attention --disable-dynamic-vram options
Tested on 18/05/26

	RTX 3060	RTX 3090	RTX 5090
TDP	170W	280W	400W
PCIe	PCIe 3.0 x4	PCIe 4.0 x8	PCIe 4.0 x16
OS	Windows 11	Ubuntu 24.04.4 LTS	Ubuntu 24.04.4 LTS
Driver	596.49	580.142	590.48.01
Python	3.13.9	3.12.3	3.12.3
torch	2.12.0+cu130	2.12.0+cu130	2.12.0+cu130
triton	3.7.0.post26 (triton-window)	3.7.0	3.7.0
sageattention	2.2.0+cu130	2.2.0	2.2.0

No LoRA

832×1216 · er_sde simple · CFG 5.0 · 30 steps · No LoRA

	RTX 3060	RTX 3090	RTX 5090
BF16 · w/o compile	0.73 it/s / 41.64s	1.77 it/s / 18.15s	5.09 it/s / 6.47s
BF16 · w/ compile	0.95 it/s / 32.22s	2.31 it/s / 14.16s	6.35 it/s / 5.37s
INT8 · w/o compile	0.87 it/s / 35.82s	2.09 it/s / 15.67s	6.32 it/s / 5.46s
INT8 · w/ compile	1.13 it/s / 28.09s	2.84 it/s / 11.77s	8.51 it/s / 4.26s
Δ w/o compile (BF16→INT8)	+19.18% / +14.00%	+18.08% / +13.69%	+24.17% / +15.61%
Δ w/ compile (BF16→INT8)	+18.95% / +12.82%	+22.94% / +16.88%	+34.02% / +20.67%

Hires LoRA

832×1216 · er_sde simple · CFG 5.0 · 30 steps · Hires LoRA

	RTX 3060	RTX 3090	RTX 5090
BF16 · w/o compile	0.73 it/s / 41.73s	1.78 it/s / 18.13s	5.01 it/s / 7.01s
BF16 · w/ compile	0.95 it/s / 32.22s	2.04 it/s / 16.00s	6.21 it/s / 5.47s
INT8 (Stoch.) · w/o compile	0.87 it/s / 36.93s	2.07 it/s / 15.75s	6.41 it/s / 6.07s
INT8 (Stoch.) · w/ compile	1.04 it/s / 31.47s	2.44 it/s / 13.52s	7.13 it/s / 5.23s
INT8 (Dyn.) · w/o compile	0.70 it/s / 44.45s	1.67 it/s / 19.32s	4.96 it/s / 7.42s
INT8 (Dyn.) · w/ compile	0.83 it/s / 37.66s	2.00 it/s / 16.28s	5.76 it/s / 6.05s
Δ Stoch. · w/o compile	+19.18% / +11.51%	+16.29% / +13.07%	+27.94% / +13.41%
Δ Stoch. · w/ compile	+9.47% / +2.33%	+19.61% / +15.50%	+14.81% / +4.39%
Δ Dyn. · w/o compile	−4.11% / −6.52%	−6.18% / −6.56%	−1.00% / −5.85%
Δ Dyn. · w/ compile	−12.63% / −16.88%	−1.96% / −1.75%	−7.25% / −10.60%

Turbo LoRA

832×1216 · er_sde simple · CFG 1.0 · 10 steps · Turbo LoRA

	RTX 3060	RTX 3090	RTX 5090
BF16 · w/o compile	1.44 it/s / 7.57s	3.55 it/s / 4.10s	10.21 it/s / 1.59s
BF16 · w/ compile	1.88 it/s / 5.92s	4.60 it/s / 3.05s	12.76 it/s / 1.38s
INT8 (Stoch.) · w/o compile	1.73 it/s / 8.55s	4.15 it/s / 3.32s	13.08 it/s / 1.78s
INT8 (Stoch.) · w/ compile	2.38 it/s / 6.75s	5.67 it/s / 2.66s	14.44 it/s / 1.73s
INT8 (Dyn.) · w/o compile	1.38 it/s / 8.74s	3.32 it/s / 3.85s	10.00 it/s / 1.82s
INT8 (Dyn.) · w/ compile	1.38 it/s / 8.94s	4.69 it/s / 3.01s	13.59 it/s / 1.53s
Δ Stoch. · w/o compile	+20.14% / −12.95%	+16.90% / +19.02%	+28.11% / −11.95%
Δ Stoch. · w/ compile	+26.60% / −14.02%	+23.26% / +12.79%	+13.17% / −25.36%
Δ Dyn. · w/o compile	−4.17% / −15.46%	−6.48% / +6.10%	−2.06% / −14.47%
Δ Dyn. · w/ compile	−26.60% / −51.01%	+1.96% / +1.31%	+6.50% / −10.87%

Δ it/s = (INT8 − BF16) / BF16 × 100 · Δ Time = (BF16 − INT8) / BF16 × 100 · positive = INT8 faster

How to use

Cloning ComfyUI-INT8-Fast to custom_nodes directory.
Recommend to run ComfyUI with --disable-dynamic-vram option.
Use Load Diffusion Model INT8 (W8A8) node to model loading and set on_the_fly_qunatization to False (default).
Recommend to use "Stochastic" for lora.

Quantized layers

INT8Tensorwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
    { "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}

INT8Rowwise

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": [
            "blocks.0.", "blocks.27.", "adaln_modulation",
            ".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"
    ]},
    { "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}

Downloads last month: 2,893

Model tree for Bedovyy/Anima-INT8

Base model

nvidia/Cosmos-Predict2-2B-Text2Image

Finetuned

circlestone-labs/Anima

Quantized

(24)

this model