| --- |
| license: other |
| license_name: circlestone-labs-non-commercial-license |
| base_model: |
| - circlestone-labs/Anima |
| pipeline_tag: text-to-image |
| library_name: diffusers |
| tags: |
| - diffusers |
| - safetensors |
| - sdnq |
| - anima |
| - cosmos |
| - text-to-image |
| - uint4 |
| --- |
| |
| # Anima Preview 3 SDNQ UINT4 Diffusers Checkpoint |
|
|
| 4-bit uint4 static SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline. This is the smallest checkpoint and lowest VRAM footprint in this comparison; the companion checkpoints are listed in the benchmark table below. |
|
|
| This repository is a separate full Diffusers checkpoint for `circlestone-labs/Anima` Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`. The `transformer/` component is the WaveCut SDNQ-quantized diffusion transformer converted from `WaveCut/Anima-Preview-3-SDNQ-uint4`. |
|
|
| ## Components |
|
|
| - `transformer/`: SDNQ `uint4` quantized `CosmosTransformer3DModel`. |
| - `llm_adapter/`: Anima LLM adapter required by the native Anima architecture. |
| - `text_encoder/`: Qwen3 0.6B text encoder from the Diffusers conversion. |
| - `tokenizer/` and `t5_tokenizer/`: Qwen and T5 tokenizers used by the adapter pathway. |
| - `vae/`: Qwen Image / Wan-style VAE used by Anima. |
| - `scheduler/`: `FlowMatchEulerDiscreteScheduler` with shift 3.0. |
|
|
| ## Usage |
|
|
| Install current Diffusers/Transformers plus SDNQ support, then load the pipeline: |
|
|
| ```python |
| import torch |
| import sdnq |
| from diffusers import DiffusionPipeline |
| |
| pipe = DiffusionPipeline.from_pretrained( |
| "WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers", |
| custom_pipeline="pipeline", |
| torch_dtype=torch.bfloat16, |
| trust_remote_code=True, |
| ).to("cuda") |
| |
| prompt = "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer" |
| negative_prompt = "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name" |
| |
| image = pipe( |
| prompt=prompt, |
| negative_prompt=negative_prompt, |
| width=1024, |
| height=1024, |
| num_inference_steps=30, |
| guidance_scale=4.0, |
| generator=torch.Generator(device="cuda").manual_seed(424242), |
| ).images[0] |
| ``` |
|
|
| Because the Anima pipeline is custom code, pass `custom_pipeline="pipeline"`; `trust_remote_code=True` allows Diffusers to load `pipeline.py` from this repo. |
|
|
| ## Prompting |
|
|
| Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. The upstream Anima Preview 3 card recommends about 1MP generation, for example `1024x1024`, `896x1152`, or `1152x896`, with roughly 30-50 steps and CFG 4-5. |
|
|
| Recommended positive prefix: |
|
|
| ```text |
| masterpiece, best quality, score_7, safe, |
| ``` |
|
|
| Recommended negative prompt: |
|
|
| ```text |
| worst quality, low quality, score_1, score_2, score_3, artist name |
| ``` |
|
|
| Use lowercase tags with spaces instead of underscores, except score tags such as `score_7`. For artist tags, prefix the artist with `@`. |
|
|
| ## 1024x1024 Comparison Grid |
|
|
| Five prompt/seed pairs were generated with the original BF16 Diffusers checkpoint, this UINT4 checkpoint, and the companion INT8 checkpoint. The source JPEG is `3572x5576`; every generated cell is exactly `1024x1024` and pasted 1:1 with no resizing. |
|
|
|  |
|
|
| Prompt IDs and seeds are printed in the left column of the grid. Raw benchmark data is available in [`benchmarks/benchmark_results_1024.json`](benchmarks/benchmark_results_1024.json). |
|
|
| ## Benchmark |
|
|
| Measured on an RTX 5090 32GB with `torch 2.8.0+cu128`, `diffusers 0.38.0`, `transformers 5.8.1`, `sdnq 0.1.8`, `torch.bfloat16`, 24 steps, CFG 4.0, and 1024x1024 output. Network download is excluded. Each model was loaded in a separate process; one 1024x1024 warm-up image was discarded, then five prompt/seed pairs were measured. VRAM was sampled with `nvidia-smi` every 50 ms. |
|
|
| | Model | Repo | Size | Load time | Mean generation | Speed vs original | VRAM after load | Peak VRAM while generating | |
| | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | |
| | Original BF16 | `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers` | 5.3 GiB | 10.04s | 6.37s/img | 1.00x | 6005 MiB | 10759 MiB | |
| | SDNQ UINT4 | `WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers` | 2.7 GiB (-49.1%) | 11.96s | 6.13s/img | 1.04x (+3.9%) | 3285 MiB (-45.3%) | 8157 MiB (-24.2%) | |
| | SDNQ INT8 | `WaveCut/Anima-Preview-3-SDNQ-int8-diffusers` | 3.5 GiB (-34.1%) | 22.41s | 4.60s/img | 1.38x (+38.4%) | 4111 MiB (-31.5%) | 8961 MiB (-16.7%) | |
|
|
| Quant-to-quant tradeoff in this run: UINT4 is 22.7% smaller than INT8 and uses 826 MiB less VRAM after load plus 804 MiB less peak generation VRAM. INT8 is 1.33x faster than UINT4 on this RTX 5090 setup. |
|
|
| ## Notes |
|
|
| The original Anima split checkpoint is a ComfyUI-native model with a Qwen3 text encoder and a learned LLM adapter. Earlier transformer-only exports that load the checkpoint directly as `CosmosTransformer3DModel` ignore the `llm_adapter.*` weights; this repo keeps the adapter and full pipeline structure so generation follows the Anima architecture. |
|
|
| License follows the upstream Anima/CircleStone non-commercial license and the NVIDIA Cosmos derivative terms referenced by the upstream model card. |
|
|