Anima Preview 3 SDNQ INT8 Transformer Component

8-bit int8 dynamic SDNQ quantization of the Anima Preview 3 diffusion transformer component from circlestone-labs/Anima (split_files/diffusion_models/anima-preview3-base.safetensors). This repo is the fastest measured option in the companion full-pipeline benchmark.

This is a Diffusers transformer component repo, not a standalone text-to-image pipeline and not a ComfyUI-native single-file checkpoint. It contains config.json, SDNQ quantized safetensors shards, and quantization_config.json for diffusers.CosmosTransformer3DModel.

For a runnable full Diffusers pipeline with Anima's text encoder, VAE, custom pipeline.py, and llm_adapter, use the companion repo: WaveCut/Anima-Preview-3-SDNQ-int8-diffusers.

Component Load Test

import torch
import sdnq
from diffusers import CosmosTransformer3DModel

transformer = CosmosTransformer3DModel.from_pretrained(
    "WaveCut/Anima-Preview-3-SDNQ-int8",
    torch_dtype=torch.bfloat16,
).to("cuda")

Component-only smoke test on RTX 5090 32GB:

Component	Size	Load time	VRAM after load	Peak VRAM while loading
Original Anima Preview 3 diffusion model	3.89 GiB	not measured here	not measured here	not measured here
SDNQ UINT4 component	1.06 GiB (-72.8%)	2.20s	1611 MiB	1611 MiB
SDNQ INT8 component	1.85 GiB (-52.6%)	12.18s	2437 MiB	2437 MiB

Raw component load data: benchmarks/component_load_tests.json.

ComfyUI Test

Native original Anima ComfyUI baseline was verified with ComfyUI commit 8505abf52e42f4441d9d53baf4c31a2ec7123400 using:

UNETLoader: anima-preview3-base.safetensors
CLIPLoader: qwen_3_06b_base.safetensors
VAELoader: qwen_image_vae.safetensors
ModelSamplingAuraFlow: shift 3.0
KSampler: er_sde, simple, 24 steps, CFG 4.0
Resolution: 1024x1024

Original ComfyUI baseline on the same five prompt/seed pairs: mean 6.53s/img, peak generation VRAM 26519 MiB. ComfyUI keeps the model mostly lazy/offloaded after loader nodes, so the meaningful memory number is the generation peak. Raw data: benchmarks/comfy_original_baseline_1024.json.

Direct ComfyUI loading of this component repo was also tested through comfyui-sdnq's SDNQSampler custom path. It is not directly loadable there because that node expects a full Diffusers pipeline directory with model_index.json; this repo is only the transformer component. ComfyUI core UNETLoader also expects a single diffusion model file and Anima detection requires llm_adapter.* weights, which are not present in this component repo. Test log: benchmarks/comfy_direct_load_tests.json.

Full-Pipeline Generation Benchmark

The generation benchmark below uses the companion full Diffusers checkpoints, where these transformer components are combined with Anima's original Qwen3 text encoder, VAE, and learned LLM adapter. This is the runnable comparison against the original full BF16 Diffusers conversion.

The source JPEG is 3572x5576; every generated cell is exactly 1024x1024 and pasted 1:1 with no resizing. Five prompt/seed pairs are listed in the grid's left column. Raw benchmark JSON: benchmarks/full_diffusers_benchmark_results_1024.json.

Measured on RTX 5090 32GB with torch 2.8.0+cu128, diffusers 0.38.0, transformers 5.8.1, sdnq 0.1.8, torch.bfloat16, 24 steps, CFG 4.0, and 1024x1024 output. Network download excluded; one warm-up image discarded; VRAM sampled with nvidia-smi every 50 ms.

Model	Repo	Size	Load time	Mean generation	Speed vs original	VRAM after load	Peak VRAM while generating
Original BF16	`CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`	5.3 GiB	10.04s	6.37s/img	1.00x	6005 MiB	10759 MiB
SDNQ UINT4 full pipeline	`WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers`	2.7 GiB (-49.1%)	11.96s	6.13s/img	1.04x (+3.9%)	3285 MiB (-45.3%)	8157 MiB (-24.2%)
SDNQ INT8 full pipeline	`WaveCut/Anima-Preview-3-SDNQ-int8-diffusers`	3.5 GiB (-34.1%)	22.41s	4.60s/img	1.38x (+38.4%)	4111 MiB (-31.5%)	8961 MiB (-16.7%)

Quant-to-quant tradeoff in the full-pipeline run: UINT4 is 22.7% smaller than INT8 and uses 826 MiB less VRAM after load plus 804 MiB less peak generation VRAM. INT8 is 1.33x faster than UINT4 on this RTX 5090 setup.

Prompting

Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. The upstream Anima Preview 3 card recommends about 1MP generation, for example 1024x1024, 896x1152, or 1152x896, with roughly 30-50 steps and CFG 4-5.

Recommended positive prefix:

masterpiece, best quality, score_7, safe,

Recommended negative prompt:

worst quality, low quality, score_1, score_2, score_3, artist name

Use lowercase tags with spaces instead of underscores, except score tags such as score_7. For artist tags, prefix the artist with @.

Notes

The original Anima split checkpoint is a ComfyUI-native model with a Qwen3 text encoder and a learned LLM adapter. Earlier transformer-only exports that load the checkpoint directly as CosmosTransformer3DModel ignore the llm_adapter.* weights; this component repo intentionally only stores the quantized transformer. Use the companion full Diffusers checkpoint for generation.

License follows the upstream Anima/CircleStone non-commercial license and the NVIDIA Cosmos derivative terms referenced by the upstream model card.

Downloads last month: 39

Model tree for WaveCut/Anima-Preview-3-SDNQ-int8

Base model

nvidia/Cosmos-Predict2-2B-Text2Image

Finetuned

circlestone-labs/Anima

Quantized

(26)

this model

Collection including WaveCut/Anima-Preview-3-SDNQ-int8

Anima Preview 3 SDNQ Checkpoints

Collection

SDNQ Anima Preview 3 component and full Diffusers checkpoints with 1024 benchmark grids. • 4 items • Updated May 13 • 2