Instructions to use Bedovyy/Anima-INT8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusion Single File
How to use Bedovyy/Anima-INT8 with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Int8 Quantized model of ANIMA

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -

- Prompt
- -
Notice
ComfyUI has native INT8 support as of commit 1a510f0, but it currently rejects the int8_rowwise format and offers no real speedup over BF16.
If you want to run an INT8 model using the Load Diffusion Model node, please use the int8convrot model.
(This model actually uses int8_rowwise with convrot, but it is labeled as int8_tensorwise to bypass ComfyUI's format restrictions.)
Generation Speed
Test Environment
- ComfyUI commit 264b003
- ComfyUI-INT8-Fast commit 7ff676c
- use
--fast fp16_accumulation fp8_matrix_mult cublas_ops --use-sage-attention --disable-dynamic-vramoptions - Tested on 18/05/26
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| TDP | 170W | 280W | 400W |
| PCIe | PCIe 3.0 x4 | PCIe 4.0 x8 | PCIe 4.0 x16 |
| OS | Windows 11 | Ubuntu 24.04.4 LTS | Ubuntu 24.04.4 LTS |
| Driver | 596.49 | 580.142 | 590.48.01 |
| Python | 3.13.9 | 3.12.3 | 3.12.3 |
| torch | 2.12.0+cu130 | 2.12.0+cu130 | 2.12.0+cu130 |
| triton | 3.7.0.post26 (triton-window) | 3.7.0 | 3.7.0 |
| sageattention | 2.2.0+cu130 | 2.2.0 | 2.2.0 |
No LoRA
832×1216 · er_sde simple · CFG 5.0 · 30 steps · No LoRA
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| BF16 · w/o compile | 0.73 it/s / 41.64s | 1.77 it/s / 18.15s | 5.09 it/s / 6.47s |
| BF16 · w/ compile | 0.95 it/s / 32.22s | 2.31 it/s / 14.16s | 6.35 it/s / 5.37s |
| INT8 · w/o compile | 0.87 it/s / 35.82s | 2.09 it/s / 15.67s | 6.32 it/s / 5.46s |
| INT8 · w/ compile | 1.13 it/s / 28.09s | 2.84 it/s / 11.77s | 8.51 it/s / 4.26s |
| Δ w/o compile (BF16→INT8) | +19.18% / +14.00% | +18.08% / +13.69% | +24.17% / +15.61% |
| Δ w/ compile (BF16→INT8) | +18.95% / +12.82% | +22.94% / +16.88% | +34.02% / +20.67% |
Hires LoRA
832×1216 · er_sde simple · CFG 5.0 · 30 steps · Hires LoRA
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| BF16 · w/o compile | 0.73 it/s / 41.73s | 1.78 it/s / 18.13s | 5.01 it/s / 7.01s |
| BF16 · w/ compile | 0.95 it/s / 32.22s | 2.04 it/s / 16.00s | 6.21 it/s / 5.47s |
| INT8 (Stoch.) · w/o compile | 0.87 it/s / 36.93s | 2.07 it/s / 15.75s | 6.41 it/s / 6.07s |
| INT8 (Stoch.) · w/ compile | 1.04 it/s / 31.47s | 2.44 it/s / 13.52s | 7.13 it/s / 5.23s |
| INT8 (Dyn.) · w/o compile | 0.70 it/s / 44.45s | 1.67 it/s / 19.32s | 4.96 it/s / 7.42s |
| INT8 (Dyn.) · w/ compile | 0.83 it/s / 37.66s | 2.00 it/s / 16.28s | 5.76 it/s / 6.05s |
| Δ Stoch. · w/o compile | +19.18% / +11.51% | +16.29% / +13.07% | +27.94% / +13.41% |
| Δ Stoch. · w/ compile | +9.47% / +2.33% | +19.61% / +15.50% | +14.81% / +4.39% |
| Δ Dyn. · w/o compile | −4.11% / −6.52% | −6.18% / −6.56% | −1.00% / −5.85% |
| Δ Dyn. · w/ compile | −12.63% / −16.88% | −1.96% / −1.75% | −7.25% / −10.60% |
Turbo LoRA
832×1216 · er_sde simple · CFG 1.0 · 10 steps · Turbo LoRA
| RTX 3060 | RTX 3090 | RTX 5090 | |
|---|---|---|---|
| BF16 · w/o compile | 1.44 it/s / 7.57s | 3.55 it/s / 4.10s | 10.21 it/s / 1.59s |
| BF16 · w/ compile | 1.88 it/s / 5.92s | 4.60 it/s / 3.05s | 12.76 it/s / 1.38s |
| INT8 (Stoch.) · w/o compile | 1.73 it/s / 8.55s | 4.15 it/s / 3.32s | 13.08 it/s / 1.78s |
| INT8 (Stoch.) · w/ compile | 2.38 it/s / 6.75s | 5.67 it/s / 2.66s | 14.44 it/s / 1.73s |
| INT8 (Dyn.) · w/o compile | 1.38 it/s / 8.74s | 3.32 it/s / 3.85s | 10.00 it/s / 1.82s |
| INT8 (Dyn.) · w/ compile | 1.38 it/s / 8.94s | 4.69 it/s / 3.01s | 13.59 it/s / 1.53s |
| Δ Stoch. · w/o compile | +20.14% / −12.95% | +16.90% / +19.02% | +28.11% / −11.95% |
| Δ Stoch. · w/ compile | +26.60% / −14.02% | +23.26% / +12.79% | +13.17% / −25.36% |
| Δ Dyn. · w/o compile | −4.17% / −15.46% | −6.48% / +6.10% | −2.06% / −14.47% |
| Δ Dyn. · w/ compile | −26.60% / −51.01% | +1.96% / +1.31% | +6.50% / −10.87% |
Δ it/s = (INT8 − BF16) / BF16 × 100 · Δ Time = (BF16 − INT8) / BF16 × 100 · positive = INT8 faster
How to use
- Cloning ComfyUI-INT8-Fast to
custom_nodesdirectory. - Recommend to run ComfyUI with
--disable-dynamic-vramoption. - Use
Load Diffusion Model INT8 (W8A8)node to model loading and seton_the_fly_qunatizationto False (default).
- Recommend to use "Stochastic" for lora.
Quantized layers
INT8Tensorwise
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": ["blocks.0", "adaln_modulation", ".mlp.layer2"] },
{ "policy": "int8_tensorwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
INT8Rowwise
{
"format": "comfy_quant",
"block_names": ["net.blocks."],
"rules": [
{ "policy": "keep", "match": [
"blocks.0.", "blocks.27.", "adaln_modulation",
".0.mlp", ".1.mlp", ".2.mlp", ".3.mlp"
]},
{ "policy": "int8_rowwise", "match": ["q_proj", "k_proj", "v_proj", "output_proj", ".mlp"] }
]
}
- Downloads last month
- 2,893
Model tree for Bedovyy/Anima-INT8
Base model
nvidia/Cosmos-Predict2-2B-Text2Image Finetuned
circlestone-labs/Anima