Anima-INT8 / README.md
Bedovyy's picture
Update README.md
31cccc2 verified
---
license: other
license_name: circlestone-labs-non-commercial-license
license_link: https://huggingface.co/circlestone-labs/Anima/blob/main/LICENSE.md
base_model:
- circlestone-labs/Anima
base_model_relation: quantized
tags:
- comfyui
- diffusion-single-file
pipeline_tag: text-to-image
---
# Int8-Tensorwise Quantized model of ANIMA
## !! You need custom node for running this model on ComfyUI. See below !!
## Generation Speed
- Tested on
- RTX5090 (400W), ComfyUI with torch2.10.0+cu130
- RTX3090 (280W), ComfyUI with torch2.9.1+cu130
- RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
- Generates 832x1216, cfg 4.0, steps 30, er sde, simple
- Torch Compile + Sage attention (Both from KJNodes)
- *RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)*
- *INT8Rowwise runs without Torch compile. (doesn't support it)*
- Second run measured
|GPU |BF16 |FP8 |INT8 |Int8Rowwise |INT8 vs BF16 (%)|
|----|-------------------|-------------------|-----------------------|-------------------|-----------------|
|5090| 6.30it/s (5.04s) | 7.20it/s (4.86s) | **8.46it/s (4.24s)** | 6.20it/s (5.36s) | **+18.8%** |
|3090| 1.70it/s (19.36s) | *1.55it/s (20.26s)* | **2.58it/s (12.62s)** | 1.79it/s (18.04s) | **+53.3%** |
|3060| 1.06it/s (29.47s) | 1.07it/s (28.91s) | **1.33it/s (23.43s)** | 1.06it/s (28.91s) | **+25.7%** |
## Sample
|quant|sample|
|-----|------|
|BF16 |![bf16](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/VexgsygEKNJxijfj3wt3U.webp)|
|FP8 |![fp8](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/by1OzGVCDh-_O9ce_OIaD.webp)|
|INT8 |![int8_3](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/KyBm-avUWjMkNqlH9DIWx.webp)|
|INT8Rowwise |![int8rowwise](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/PWhJAZbhY3u4EpqjDgwjO.webp)|
## How to use
1. Cloning [ComfyUI-Flux2-INT8](https://github.com/BobJohnson24/ComfyUI-Flux2-INT8) to `custom_nodes` directory.
3. Use `Load Diffusion Model INT8 (W8A8)` node to model loading and set `on_the_fly_qunatization` to **False** (default).
![image](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/HLRpnDr9i1rOGX-STN01d.png)
## Quantized layers
### INT8Tensorwise
```json
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
{ "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
```
### INT8Rowwise
```json
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": ["blocks.0.", "adaln_modulation", ".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"] },
{ "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
```