|
|
--- |
|
|
license: other |
|
|
license_name: circlestone-labs-non-commercial-license |
|
|
license_link: https://huggingface.co/circlestone-labs/Anima/blob/main/LICENSE.md |
|
|
base_model: |
|
|
- circlestone-labs/Anima |
|
|
base_model_relation: quantized |
|
|
tags: |
|
|
- ComfyUI |
|
|
--- |
|
|
|
|
|
# Int8-Tensorwise Quantized model of ANIMA |
|
|
|
|
|
## !! You need custom node for running this model on ComfyUI. See below !! |
|
|
|
|
|
## Generation Speed |
|
|
|
|
|
- Tested on |
|
|
- RTX5090 (400W), ComfyUI with torch2.10.0+cu130 |
|
|
- RTX3090 (280W), ComfyUI with torch2.9.1+cu130 |
|
|
- RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130 |
|
|
- Generates 832x1216, cfg 4.0, steps 30, er sde, simple |
|
|
- Torch Compile + Sage attention (Both from KJNodes) |
|
|
- *RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)* |
|
|
- Second run measured |
|
|
|
|
|
|GPU |BF16 |FP8 |INT8 |Speed vs BF16 (%)| |
|
|
|----|-------------------|-------------------|-----------------------|-----------------| |
|
|
|5090| 6.30it/s (5.04s) | 7.20it/s (4.86s) | **8.46it/s (4.24s)** | **+18.8%** | |
|
|
|3090| 1.70it/s (19.36s) | *1.55it/s (20.26s)* | **2.58it/s (12.62s)** | **+53.3%** | |
|
|
|3060| 1.06it/s (29.47s) | 1.07it/s (28.91s) | **1.33it/s (23.43s)** | **+25.7%** | |
|
|
|
|
|
## Sample |
|
|
|
|
|
|quant|sample| |
|
|
|-----|------| |
|
|
|BF16 || |
|
|
|FP8 || |
|
|
|INT8 || |
|
|
|
|
|
## How to use |
|
|
|
|
|
1. Cloning [ComfyUI-Flux2-INT8](https://github.com/BobJohnson24/ComfyUI-Flux2-INT8) to `custom_nodes` directory. |
|
|
2. Modify `int8_unet_loader.py` as below. |
|
|
```diff |
|
|
diff --git a/int8_unet_loader.py b/int8_unet_loader.py |
|
|
index 5fee67a..f5529fd 100644 |
|
|
--- a/int8_unet_loader.py |
|
|
+++ b/int8_unet_loader.py |
|
|
@@ -21,7 +21,7 @@ class UNetLoaderINTW8A8: |
|
|
"required": { |
|
|
"unet_name": (folder_paths.get_filename_list("diffusion_models"),), |
|
|
"weight_dtype": (["default", "fp8_e4m3fn", "fp16", "bf16"],), |
|
|
- "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen"],), |
|
|
+ "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen", "prequantized"],), |
|
|
} |
|
|
} |
|
|
|
|
|
@@ -77,6 +77,8 @@ class UNetLoaderINTW8A8: |
|
|
'audio_scale_shift_table', 'av_ca_a2v_gate_adaln_single', 'av_ca_audio_scale_shift_adaln_single', 'av_ca_v2a_gate_adaln_single', |
|
|
'av_ca_video_scale_shift_adaln_single', 'caption_projection', 'patchify_proj', 'proj_out', 'scale_shift_table', |
|
|
] |
|
|
+ elif model_type == "prequantized": |
|
|
+ Int8TensorwiseOps.excluded_names = ["."] |
|
|
#print(f"Applying model-specific exclusions to Int8TensorwiseOps: {Int8TensorwiseOps.excluded_names}") |
|
|
|
|
|
# Load model directly - Int8TensorwiseOps handles int8 weights natively |
|
|
``` |
|
|
3. Use `Load Diffusion Model INT8 (W8A8)` node to model loading and set `model_type` to **prequantized**. |
|
|
 |
|
|
|
|
|
|
|
|
## Quantized layers |
|
|
|
|
|
```json |
|
|
{ |
|
|
"format": "comfy_quant", |
|
|
"block_names": ["net.blocks."], |
|
|
"rules": [ |
|
|
{ "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] }, |
|
|
{ "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] } |
|
|
] |
|
|
} |
|
|
``` |
|
|
|
|
|
|