Anima-INT8-Tensorwise / README.md

Bedovyy

Update README.md

bb5e86e verified 5 days ago

preview code

raw

history blame contribute delete

3.72 kB

metadata

license: other
license_name: circlestone-labs-non-commercial-license
license_link: https://huggingface.co/circlestone-labs/Anima/blob/main/LICENSE.md
base_model:
  - circlestone-labs/Anima
base_model_relation: quantized
tags:
  - ComfyUI

Int8-Tensorwise Quantized model of ANIMA

!! You need custom node for running this model on ComfyUI. See below !!

Generation Speed

Tested on
- RTX5090 (400W), ComfyUI with torch2.10.0+cu130
- RTX3090 (280W), ComfyUI with torch2.9.1+cu130
- RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
Generates 832x1216, cfg 4.0, steps 30, er sde, simple
Torch Compile + Sage attention (Both from KJNodes)
- RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)
Second run measured

GPU	BF16	FP8	INT8	Speed vs BF16 (%)
5090	6.30it/s (5.04s)	7.20it/s (4.86s)	8.46it/s (4.24s)	+18.8%
3090	1.70it/s (19.36s)	1.55it/s (20.26s)	2.58it/s (12.62s)	+53.3%
3060	1.06it/s (29.47s)	1.07it/s (28.91s)	1.33it/s (23.43s)	+25.7%

Sample

quant	sample
BF16
FP8
INT8

How to use

Cloning ComfyUI-Flux2-INT8 to custom_nodes directory.

Modify int8_unet_loader.py as below.

diff --git a/int8_unet_loader.py b/int8_unet_loader.py
index 5fee67a..f5529fd 100644
--- a/int8_unet_loader.py
+++ b/int8_unet_loader.py
@@ -21,7 +21,7 @@ class UNetLoaderINTW8A8:
             "required": {
                 "unet_name": (folder_paths.get_filename_list("diffusion_models"),),
                 "weight_dtype": (["default", "fp8_e4m3fn", "fp16", "bf16"],),
-                "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen"],),
+                "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen", "prequantized"],),
             }
         }

@@ -77,6 +77,8 @@ class UNetLoaderINTW8A8:
                 'audio_scale_shift_table', 'av_ca_a2v_gate_adaln_single', 'av_ca_audio_scale_shift_adaln_single', 'av_ca_v2a_gate_adaln_single',
                 'av_ca_video_scale_shift_adaln_single', 'caption_projection', 'patchify_proj', 'proj_out', 'scale_shift_table',
             ]
+        elif model_type == "prequantized":
+            Int8TensorwiseOps.excluded_names = ["."]
             #print(f"Applying model-specific exclusions to Int8TensorwiseOps: {Int8TensorwiseOps.excluded_names}")

         # Load model directly - Int8TensorwiseOps handles int8 weights natively

Use Load Diffusion Model INT8 (W8A8) node to model loading and set model_type to prequantized.

Quantized layers

{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
    { "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}