Fix Diffusers class metadata warning

a6aa53f verified 1 day ago

5.38 kB

license: other
license_name: circlestone-labs-non-commercial-license
base_model:
  - circlestone-labs/Anima
pipeline_tag: text-to-image
library_name: diffusers
tags:
  - diffusers
  - safetensors
  - sdnq
  - anima
  - cosmos
  - text-to-image
  - int8

Anima Preview 3 SDNQ INT8 Diffusers Checkpoint

8-bit int8 dynamic SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline. This is the fastest measured checkpoint in this comparison; the companion checkpoints are listed in the benchmark table below.

This repository is a separate full Diffusers checkpoint for circlestone-labs/Anima Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers. The transformer/ component is the WaveCut SDNQ-quantized diffusion transformer converted from WaveCut/Anima-Preview-3-SDNQ-int8.

Components

transformer/: SDNQ int8 quantized CosmosTransformer3DModel.
llm_adapter/: Anima LLM adapter required by the native Anima architecture.
text_encoder/: Qwen3 0.6B text encoder from the Diffusers conversion.
tokenizer/ and t5_tokenizer/: Qwen and T5 tokenizers used by the adapter pathway.
vae/: Qwen Image / Wan-style VAE used by Anima.
scheduler/: FlowMatchEulerDiscreteScheduler with shift 3.0.

Usage

Install current Diffusers/Transformers plus SDNQ support, then load the pipeline:

import torch
import sdnq
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "WaveCut/Anima-Preview-3-SDNQ-int8-diffusers",
    custom_pipeline="pipeline",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda")

prompt = "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer"
negative_prompt = "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name"

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=1024,
    height=1024,
    num_inference_steps=30,
    guidance_scale=4.0,
    generator=torch.Generator(device="cuda").manual_seed(424242),
).images[0]

Because the Anima pipeline is custom code, pass custom_pipeline="pipeline"; trust_remote_code=True allows Diffusers to load pipeline.py from this repo.

Prompting

Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. The upstream Anima Preview 3 card recommends about 1MP generation, for example 1024x1024, 896x1152, or 1152x896, with roughly 30-50 steps and CFG 4-5.

Recommended positive prefix:

masterpiece, best quality, score_7, safe,

Recommended negative prompt:

worst quality, low quality, score_1, score_2, score_3, artist name

Use lowercase tags with spaces instead of underscores, except score tags such as score_7. For artist tags, prefix the artist with @.

1024x1024 Comparison Grid

Five prompt/seed pairs were generated with the original BF16 Diffusers checkpoint, the companion UINT4 checkpoint, and this INT8 checkpoint. The source JPEG is 3572x5576; every generated cell is exactly 1024x1024 and pasted 1:1 with no resizing.

Prompt IDs and seeds are printed in the left column of the grid. Raw benchmark data is available in benchmarks/benchmark_results_1024.json.

Benchmark

Measured on an RTX 5090 32GB with torch 2.8.0+cu128, diffusers 0.38.0, transformers 5.8.1, sdnq 0.1.8, torch.bfloat16, 24 steps, CFG 4.0, and 1024x1024 output. Network download is excluded. Each model was loaded in a separate process; one 1024x1024 warm-up image was discarded, then five prompt/seed pairs were measured. VRAM was sampled with nvidia-smi every 50 ms.

Model	Repo	Size	Load time	Mean generation	Speed vs original	VRAM after load	Peak VRAM while generating
Original BF16	`CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`	5.3 GiB	10.04s	6.37s/img	1.00x	6005 MiB	10759 MiB
SDNQ UINT4	`WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers`	2.7 GiB (-49.1%)	11.96s	6.13s/img	1.04x (+3.9%)	3285 MiB (-45.3%)	8157 MiB (-24.2%)
SDNQ INT8	`WaveCut/Anima-Preview-3-SDNQ-int8-diffusers`	3.5 GiB (-34.1%)	22.41s	4.60s/img	1.38x (+38.4%)	4111 MiB (-31.5%)	8961 MiB (-16.7%)

Quant-to-quant tradeoff in this run: UINT4 is 22.7% smaller than INT8 and uses 826 MiB less VRAM after load plus 804 MiB less peak generation VRAM. INT8 is 1.33x faster than UINT4 on this RTX 5090 setup.

Notes

The original Anima split checkpoint is a ComfyUI-native model with a Qwen3 text encoder and a learned LLM adapter. Earlier transformer-only exports that load the checkpoint directly as CosmosTransformer3DModel ignore the llm_adapter.* weights; this repo keeps the adapter and full pipeline structure so generation follows the Anima architecture.

License follows the upstream Anima/CircleStone non-commercial license and the NVIDIA Cosmos derivative terms referenced by the upstream model card.