license: other
license_name: circlestone-labs-non-commercial-license
base_model:
- circlestone-labs/Anima
pipeline_tag: text-to-image
library_name: diffusers
tags:
- diffusers
- safetensors
- sdnq
- anima
- cosmos
- text-to-image
- int8
Anima Preview 3 SDNQ INT8 Diffusers Checkpoint
8-bit int8 dynamic SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline. This is the fastest measured checkpoint in this comparison; the companion checkpoints are listed in the benchmark table below.
This repository is a separate full Diffusers checkpoint for circlestone-labs/Anima Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers. The transformer/ component is the WaveCut SDNQ-quantized diffusion transformer converted from WaveCut/Anima-Preview-3-SDNQ-int8.
Components
transformer/: SDNQint8quantizedCosmosTransformer3DModel.llm_adapter/: Anima LLM adapter required by the native Anima architecture.text_encoder/: Qwen3 0.6B text encoder from the Diffusers conversion.tokenizer/andt5_tokenizer/: Qwen and T5 tokenizers used by the adapter pathway.vae/: Qwen Image / Wan-style VAE used by Anima.scheduler/:FlowMatchEulerDiscreteSchedulerwith shift 3.0.
Usage
Install current Diffusers/Transformers plus SDNQ support, then load the pipeline:
import torch
import sdnq
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"WaveCut/Anima-Preview-3-SDNQ-int8-diffusers",
custom_pipeline="pipeline",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).to("cuda")
prompt = "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer"
negative_prompt = "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name"
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
num_inference_steps=30,
guidance_scale=4.0,
generator=torch.Generator(device="cuda").manual_seed(424242),
).images[0]
Because the Anima pipeline is custom code, pass custom_pipeline="pipeline"; trust_remote_code=True allows Diffusers to load pipeline.py from this repo.
Prompting
Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. The upstream Anima Preview 3 card recommends about 1MP generation, for example 1024x1024, 896x1152, or 1152x896, with roughly 30-50 steps and CFG 4-5.
Recommended positive prefix:
masterpiece, best quality, score_7, safe,
Recommended negative prompt:
worst quality, low quality, score_1, score_2, score_3, artist name
Use lowercase tags with spaces instead of underscores, except score tags such as score_7. For artist tags, prefix the artist with @.
1024x1024 Comparison Grid
Five prompt/seed pairs were generated with the original BF16 Diffusers checkpoint, the companion UINT4 checkpoint, and this INT8 checkpoint. The source JPEG is 3572x5576; every generated cell is exactly 1024x1024 and pasted 1:1 with no resizing.
Prompt IDs and seeds are printed in the left column of the grid. Raw benchmark data is available in benchmarks/benchmark_results_1024.json.
Benchmark
Measured on an RTX 5090 32GB with torch 2.8.0+cu128, diffusers 0.38.0, transformers 5.8.1, sdnq 0.1.8, torch.bfloat16, 24 steps, CFG 4.0, and 1024x1024 output. Network download is excluded. Each model was loaded in a separate process; one 1024x1024 warm-up image was discarded, then five prompt/seed pairs were measured. VRAM was sampled with nvidia-smi every 50 ms.
| Model | Repo | Size | Load time | Mean generation | Speed vs original | VRAM after load | Peak VRAM while generating |
|---|---|---|---|---|---|---|---|
| Original BF16 | CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers |
5.3 GiB | 10.04s | 6.37s/img | 1.00x | 6005 MiB | 10759 MiB |
| SDNQ UINT4 | WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers |
2.7 GiB (-49.1%) | 11.96s | 6.13s/img | 1.04x (+3.9%) | 3285 MiB (-45.3%) | 8157 MiB (-24.2%) |
| SDNQ INT8 | WaveCut/Anima-Preview-3-SDNQ-int8-diffusers |
3.5 GiB (-34.1%) | 22.41s | 4.60s/img | 1.38x (+38.4%) | 4111 MiB (-31.5%) | 8961 MiB (-16.7%) |
Quant-to-quant tradeoff in this run: UINT4 is 22.7% smaller than INT8 and uses 826 MiB less VRAM after load plus 804 MiB less peak generation VRAM. INT8 is 1.33x faster than UINT4 on this RTX 5090 setup.
Notes
The original Anima split checkpoint is a ComfyUI-native model with a Qwen3 text encoder and a learned LLM adapter. Earlier transformer-only exports that load the checkpoint directly as CosmosTransformer3DModel ignore the llm_adapter.* weights; this repo keeps the adapter and full pipeline structure so generation follows the Anima architecture.
License follows the upstream Anima/CircleStone non-commercial license and the NVIDIA Cosmos derivative terms referenced by the upstream model card.
