---
license: other
license_name: circlestone-labs-non-commercial-license
license_link: https://huggingface.co/circlestone-labs/Anima/blob/main/LICENSE.md
base_model:
- circlestone-labs/Anima
base_model_relation: quantized
tags:
- ComfyUI
---

# Int8-Tensorwise Quantized model of ANIMA

## !! You need custom node for running this model on ComfyUI. See below !!

## Generation Speed

- Tested on
  - RTX5090 (400W), ComfyUI with torch2.10.0+cu130
  - RTX3090 (280W), ComfyUI with torch2.9.1+cu130
  - RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
- Generates 832x1216, cfg 4.0, steps 30, er sde, simple
- Torch Compile + Sage attention (Both from KJNodes)
  - *RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)*
- Second run measured

|GPU |BF16               |FP8                |INT8                   |Speed vs BF16 (%)|
|----|-------------------|-------------------|-----------------------|-----------------|
|5090| 6.30it/s (5.04s)  | 7.20it/s (4.86s)  | **8.46it/s (4.24s)**  |   **+18.8%**    |
|3090| 1.70it/s (19.36s) | *1.55it/s (20.26s)* | **2.58it/s (12.62s)** |   **+53.3%**    |
|3060| 1.06it/s (29.47s) | 1.07it/s (28.91s) | **1.33it/s (23.43s)** |   **+25.7%**    |

## Sample

|quant|sample|
|-----|------|
|BF16 |![bf16](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/VexgsygEKNJxijfj3wt3U.webp)|
|FP8  |![fp8](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/by1OzGVCDh-_O9ce_OIaD.webp)|
|INT8 |![int8_3](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/KyBm-avUWjMkNqlH9DIWx.webp)|

## How to use

1. Cloning [ComfyUI-Flux2-INT8](https://github.com/BobJohnson24/ComfyUI-Flux2-INT8) to `custom_nodes` directory.
2. Modify `int8_unet_loader.py` as below.
    ```diff
    diff --git a/int8_unet_loader.py b/int8_unet_loader.py
    index 5fee67a..f5529fd 100644
    --- a/int8_unet_loader.py
    +++ b/int8_unet_loader.py
    @@ -21,7 +21,7 @@ class UNetLoaderINTW8A8:
                 "required": {
                     "unet_name": (folder_paths.get_filename_list("diffusion_models"),),
                     "weight_dtype": (["default", "fp8_e4m3fn", "fp16", "bf16"],),
    -                "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen"],),
    +                "model_type": (["flux2", "z-image", "chroma", "wan", "ltx2", "qwen", "prequantized"],),
                 }
             }
    
    @@ -77,6 +77,8 @@ class UNetLoaderINTW8A8:
                     'audio_scale_shift_table', 'av_ca_a2v_gate_adaln_single', 'av_ca_audio_scale_shift_adaln_single', 'av_ca_v2a_gate_adaln_single',
                     'av_ca_video_scale_shift_adaln_single', 'caption_projection', 'patchify_proj', 'proj_out', 'scale_shift_table',
                 ]
    +        elif model_type == "prequantized":
    +            Int8TensorwiseOps.excluded_names = ["."]
                 #print(f"Applying model-specific exclusions to Int8TensorwiseOps: {Int8TensorwiseOps.excluded_names}")
    
             # Load model directly - Int8TensorwiseOps handles int8 weights natively
    ```
3. Use `Load Diffusion Model INT8 (W8A8)` node to model loading and set `model_type` to **prequantized**.
   ![image](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/HLRpnDr9i1rOGX-STN01d.png)


## Quantized layers

```json
{
  "format": "comfy_quant",
  "block_names": ["net.blocks."],
  "rules": [
    { "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
    { "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
  ]
}
```