Int8-Tensorwise Quantized model of ANIMA
!! You need custom node for running this model on ComfyUI. See below !!
Generation Speed
- Tested on
- RTX5090 (400W), ComfyUI with torch2.10.0+cu130
- RTX3090 (280W), ComfyUI with torch2.9.1+cu130
- RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
- Generates 832x1216, cfg 4.0, steps 30, er sde, simple
- Torch Compile + Sage attention (Both from KJNodes)
- RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)
- INT8Rowwise runs without Torch compile. (doesn't support it)
- Second run measured
| GPU | BF16 | FP8 | INT8 | Int8Rowwise | INT8 vs BF16 (%) |
|---|---|---|---|---|---|
| 5090 | 6.30it/s (5.04s) | 7.20it/s (4.86s) | 8.46it/s (4.24s) | 6.20it/s (5.36s) | +18.8% |
| 3090 | 1.70it/s (19.36s) | 1.55it/s (20.26s) | 2.58it/s (12.62s) | 1.79it/s (18.04s) | +53.3% |
| 3060 | 1.06it/s (29.47s) | 1.07it/s (28.91s) | 1.33it/s (23.43s) | 1.06it/s (28.91s) | +25.7% |
Sample
How to use
- Cloning ComfyUI-Flux2-INT8 to
custom_nodesdirectory. - Use
Load Diffusion Model INT8 (W8A8)node to model loading and seton_the_fly_qunatizationto False (default).
Quantized layers
INT8Tensorwise
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
{ "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
INT8Rowwise
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": ["blocks.0.", "adaln_modulation", ".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"] },
{ "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
- Downloads last month
- 83
Model tree for Bedovyy/Anima-INT8
Base model
circlestone-labs/Anima


