Bedovyy
/

Anima-INT8

Diffusion Single File

Model card Files Files and versions

Anima-INT8 / README.md

Bedovyy's picture

Update README.md

31cccc2 verified 13 days ago

|

history blame contribute delete

3.05 kB

	---
	license: other
	license_name: circlestone-labs-non-commercial-license
	license_link: https://huggingface.co/circlestone-labs/Anima/blob/main/LICENSE.md
	base_model:
	- circlestone-labs/Anima
	base_model_relation: quantized
	tags:
	- comfyui
	- diffusion-single-file
	pipeline_tag: text-to-image
	---

	# Int8-Tensorwise Quantized model of ANIMA

	## !! You need custom node for running this model on ComfyUI. See below !!

	## Generation Speed

	- Tested on
	- RTX5090 (400W), ComfyUI with torch2.10.0+cu130
	- RTX3090 (280W), ComfyUI with torch2.9.1+cu130
	- RTX3060 (PCIe4.0 x4), ComfyUI with torch2.9.1+cu130
	- Generates 832x1216, cfg 4.0, steps 30, er sde, simple
	- Torch Compile + Sage attention (Both from KJNodes)
	- RTX3090 runs fp8 without Torch compile, because official triton uses fp8e4nv which is not supported on RTX3000 series. (triton-windows supports)
	- INT8Rowwise runs without Torch compile. (doesn't support it)
	- Second run measured

	\|GPU \|BF16 \|FP8 \|INT8 \|Int8Rowwise \|INT8 vs BF16 (%)\|
	\|----\|-------------------\|-------------------\|-----------------------\|-------------------\|-----------------\|
	\|5090\| 6.30it/s (5.04s) \| 7.20it/s (4.86s) \| 8.46it/s (4.24s) \| 6.20it/s (5.36s) \| +18.8% \|
	\|3090\| 1.70it/s (19.36s) \| 1.55it/s (20.26s) \| 2.58it/s (12.62s) \| 1.79it/s (18.04s) \| +53.3% \|
	\|3060\| 1.06it/s (29.47s) \| 1.07it/s (28.91s) \| 1.33it/s (23.43s) \| 1.06it/s (28.91s) \| +25.7% \|

	## Sample

	\|quant\|sample\|
	\|-----\|------\|
	\|BF16 \|![bf16](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/VexgsygEKNJxijfj3wt3U.webp)\|
	\|FP8 \|![fp8](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/by1OzGVCDh-_O9ce_OIaD.webp)\|
	\|INT8 \|![int8_3](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/KyBm-avUWjMkNqlH9DIWx.webp)\|
	\|INT8Rowwise \|![int8rowwise](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/PWhJAZbhY3u4EpqjDgwjO.webp)\|

	## How to use

	1. Cloning [ComfyUI-Flux2-INT8](https://github.com/BobJohnson24/ComfyUI-Flux2-INT8) to `custom_nodes` directory.
	3. Use `Load Diffusion Model INT8 (W8A8)` node to model loading and set `on_the_fly_qunatization` to False (default).
	![image](https://cdn-uploads.huggingface.co/production/uploads/63fbf6951b4b1bd4e706fed1/HLRpnDr9i1rOGX-STN01d.png)


	## Quantized layers

	### INT8Tensorwise
	```json
	{
	"format": "comfy_quant",
	"block_names": ["net.blocks."],
	"rules": [
	{ "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
	{ "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
	]
	}
	```

	### INT8Rowwise
	```json
	{
	"format": "comfy_quant",
	"block_names": ["net.blocks."],
	"rules": [
	{ "policy": "keep", "match": ["blocks.0.", "adaln_modulation", ".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"] },
	{ "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
	]
	}
	```