Add 1024 benchmark grid and stats

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +19 -17
benchmarks/benchmark_results_1024.json +216 -0
images/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg +3 -0

.gitattributes CHANGED Viewed

@@ -30,3 +30,4 @@ text_encoder/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 images/anima_sdnq_pair_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text
 images/anima_uint4_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text

 tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 images/anima_sdnq_pair_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text
 images/anima_uint4_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text
+images/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -17,9 +17,9 @@ tags:
 # Anima Preview 3 SDNQ UINT4 Diffusers Checkpoint
-4-bit uint4 static SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline.
-This repository is a separate full Diffusers checkpoint for `circlestone-labs/Anima` Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`. The `transformer/` component is the WaveCut SDNQ-quantized diffusion transformer.
 ## Components
@@ -45,8 +45,8 @@ pipe = DiffusionPipeline.from_pretrained(
     trust_remote_code=True,
 ).to("cuda")
-prompt = 'masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer'
-negative_prompt = 'worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name'
 image = pipe(
     prompt=prompt,
@@ -61,7 +61,9 @@ image = pipe(
 ## Prompting
-Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. Recommended positive prefix:
 ```text
 masterpiece, best quality, score_7, safe,
@@ -75,25 +77,25 @@ worst quality, low quality, score_1, score_2, score_3, artist name
 Use lowercase tags with spaces instead of underscores, except score tags such as `score_7`. For artist tags, prefix the artist with `@`.
-## Sample Output
-Comparison generated from the public Diffusers checkpoints:
-![Anima SDNQ uint4 vs int8 comparison](images/anima_sdnq_pair_seed_424242_768x768.png)
-This checkpoint sample:
-![Anima SDNQ uint4 sample](images/anima_uint4_seed_424242_768x768.png)
-Generation settings:
-- Prompt: `masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer`
-- Negative prompt: `worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name`
-- Seed: `424242`
-- Size: `768x768`
-- Steps: `24`
-- CFG: `4.0`
 ## Notes

 # Anima Preview 3 SDNQ UINT4 Diffusers Checkpoint
+4-bit uint4 static SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline. This is the smallest checkpoint and lowest VRAM footprint in this comparison; the companion checkpoints are listed in the benchmark table below.
+This repository is a separate full Diffusers checkpoint for `circlestone-labs/Anima` Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`. The `transformer/` component is the WaveCut SDNQ-quantized diffusion transformer converted from `WaveCut/Anima-Preview-3-SDNQ-uint4`.
 ## Components
     trust_remote_code=True,
 ).to("cuda")
+prompt = "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer"
+negative_prompt = "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name"
 image = pipe(
     prompt=prompt,
 ## Prompting
+Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. The upstream Anima Preview 3 card recommends about 1MP generation, for example `1024x1024`, `896x1152`, or `1152x896`, with roughly 30-50 steps and CFG 4-5.
+Recommended positive prefix:
 ```text
 masterpiece, best quality, score_7, safe,
 Use lowercase tags with spaces instead of underscores, except score tags such as `score_7`. For artist tags, prefix the artist with `@`.
+## 1024x1024 Comparison Grid
+Five prompt/seed pairs were generated with the original BF16 Diffusers checkpoint, this UINT4 checkpoint, and the companion INT8 checkpoint. The source JPEG is `3572x5576`; every generated cell is exactly `1024x1024` and pasted 1:1 with no resizing.
+![Anima Original BF16 vs SDNQ UINT4 and INT8 1024x1024 grid](images/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg)
+Prompt IDs and seeds are printed in the left column of the grid. Raw benchmark data is available in [`benchmarks/benchmark_results_1024.json`](benchmarks/benchmark_results_1024.json).
+## Benchmark
+Measured on an RTX 5090 32GB with `torch 2.8.0+cu128`, `diffusers 0.38.0`, `transformers 5.8.1`, `sdnq 0.1.8`, `torch.bfloat16`, 24 steps, CFG 4.0, and 1024x1024 output. Network download is excluded. Each model was loaded in a separate process; one 1024x1024 warm-up image was discarded, then five prompt/seed pairs were measured. VRAM was sampled with `nvidia-smi` every 50 ms.
+| Model | Repo | Size | Load time | Mean generation | Speed vs original | VRAM after load | Peak VRAM while generating |
+| --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
+| Original BF16 | `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers` | 5.3 GiB | 10.04s | 6.37s/img | 1.00x | 6005 MiB | 10759 MiB |
+| SDNQ UINT4 | `WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers` | 2.7 GiB (-49.1%) | 11.96s | 6.13s/img | 1.04x (+3.9%) | 3285 MiB (-45.3%) | 8157 MiB (-24.2%) |
+| SDNQ INT8 | `WaveCut/Anima-Preview-3-SDNQ-int8-diffusers` | 3.5 GiB (-34.1%) | 22.41s | 4.60s/img | 1.38x (+38.4%) | 4111 MiB (-31.5%) | 8961 MiB (-16.7%) |
+Quant-to-quant tradeoff in this run: UINT4 is 22.7% smaller than INT8 and uses 826 MiB less VRAM after load plus 804 MiB less peak generation VRAM. INT8 is 1.33x faster than UINT4 on this RTX 5090 setup.
 ## Notes

benchmarks/benchmark_results_1024.json ADDED Viewed

	@@ -0,0 +1,216 @@

+{
+  "hardware": "NVIDIA GeForce RTX 5090 32GB",
+  "software": {
+    "torch": "2.8.0+cu128",
+    "diffusers": "0.38.0",
+    "transformers": "5.8.1",
+    "sdnq": "0.1.8"
+  },
+  "benchmark_note": "Network download excluded. One 1024x1024 warm-up generation per model, then five measured 1024x1024 generations. VRAM sampled with nvidia-smi every 50 ms in an isolated process per model.",
+  "width": 1024,
+  "height": 1024,
+  "steps": 24,
+  "guidance_scale": 4.0,
+  "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
+  "prompts": [
+    {
+      "id": "fern",
+      "seed": 424242,
+      "prompt": "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer"
+    },
+    {
+      "id": "city",
+      "seed": 424243,
+      "prompt": "masterpiece, best quality, score_7, safe, anime screenshot, 1girl, short black hair, red jacket, standing on a rainy neon city street at night, reflections, cinematic lighting"
+    },
+    {
+      "id": "witch",
+      "seed": 424244,
+      "prompt": "masterpiece, best quality, score_7, safe, 1girl, witch hat, silver hair, blue eyes, starry sky, floating books, glowing magic circle, detailed illustration"
+    },
+    {
+      "id": "mecha",
+      "seed": 424245,
+      "prompt": "masterpiece, best quality, score_7, safe, 1boy, pilot suit, white mecha in the background, sunset hangar, dramatic rim light, anime key visual"
+    },
+    {
+      "id": "garden",
+      "seed": 424246,
+      "prompt": "masterpiece, best quality, score_7, safe, 2girls, summer dresses, flower garden, butterflies, warm sunlight, soft watercolor anime style"
+    }
+  ],
+  "models": [
+    {
+      "key": "original",
+      "title": "Original BF16",
+      "path": "/root/anima-transformers-convert/original-full",
+      "repo": "CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers",
+      "hardware": "NVIDIA GeForce RTX 5090 32GB",
+      "dtype": "torch.bfloat16",
+      "width": 1024,
+      "height": 1024,
+      "steps": 24,
+      "guidance_scale": 4.0,
+      "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
+      "baseline_vram_mib": 511,
+      "load_seconds": 10.04116036000778,
+      "vram_after_load_mib": 6005,
+      "vram_load_peak_mib": 6005,
+      "vram_generation_peak_mib": 10759,
+      "torch_peak_allocated_mib": 9669,
+      "runs": [
+        {
+          "prompt_id": "fern",
+          "seed": 424242,
+          "seconds": 6.371356149989879,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/fern_original_seed_424242_1024x1024.png"
+        },
+        {
+          "prompt_id": "city",
+          "seed": 424243,
+          "seconds": 6.3718316220038105,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/city_original_seed_424243_1024x1024.png"
+        },
+        {
+          "prompt_id": "witch",
+          "seed": 424244,
+          "seconds": 6.374521128003835,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/witch_original_seed_424244_1024x1024.png"
+        },
+        {
+          "prompt_id": "mecha",
+          "seed": 424245,
+          "seconds": 6.371869497001171,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/mecha_original_seed_424245_1024x1024.png"
+        },
+        {
+          "prompt_id": "garden",
+          "seed": 424246,
+          "seconds": 6.372184988998924,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/garden_original_seed_424246_1024x1024.png"
+        }
+      ],
+      "mean_generation_seconds": 6.372352677199524,
+      "relative_to_original_speedup": 1.0,
+      "vram_after_load_delta_vs_original_mib": 0,
+      "vram_generation_peak_delta_vs_original_mib": 0
+    },
+    {
+      "key": "uint4",
+      "title": "SDNQ UINT4",
+      "path": "/root/anima-transformers-convert/full/Anima-SDNQ-uint4-diffusers",
+      "repo": "WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers",
+      "hardware": "NVIDIA GeForce RTX 5090 32GB",
+      "dtype": "torch.bfloat16",
+      "width": 1024,
+      "height": 1024,
+      "steps": 24,
+      "guidance_scale": 4.0,
+      "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
+      "baseline_vram_mib": 511,
+      "load_seconds": 11.955643722001696,
+      "vram_after_load_mib": 3285,
+      "vram_load_peak_mib": 3181,
+      "vram_generation_peak_mib": 8157,
+      "torch_peak_allocated_mib": 6971,
+      "runs": [
+        {
+          "prompt_id": "fern",
+          "seed": 424242,
+          "seconds": 6.849568051999086,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/fern_uint4_seed_424242_1024x1024.png"
+        },
+        {
+          "prompt_id": "city",
+          "seed": 424243,
+          "seconds": 5.868479846001719,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/city_uint4_seed_424243_1024x1024.png"
+        },
+        {
+          "prompt_id": "witch",
+          "seed": 424244,
+          "seconds": 6.189502780995099,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/witch_uint4_seed_424244_1024x1024.png"
+        },
+        {
+          "prompt_id": "mecha",
+          "seed": 424245,
+          "seconds": 5.836763394996524,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/mecha_uint4_seed_424245_1024x1024.png"
+        },
+        {
+          "prompt_id": "garden",
+          "seed": 424246,
+          "seconds": 5.911209135010722,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/garden_uint4_seed_424246_1024x1024.png"
+        }
+      ],
+      "mean_generation_seconds": 6.13110464180063,
+      "relative_to_original_speedup": 1.0393482169190384,
+      "vram_after_load_delta_vs_original_mib": -2720,
+      "vram_generation_peak_delta_vs_original_mib": -2602
+    },
+    {
+      "key": "int8",
+      "title": "SDNQ INT8",
+      "path": "/root/anima-transformers-convert/full/Anima-SDNQ-int8-diffusers",
+      "repo": "WaveCut/Anima-Preview-3-SDNQ-int8-diffusers",
+      "hardware": "NVIDIA GeForce RTX 5090 32GB",
+      "dtype": "torch.bfloat16",
+      "width": 1024,
+      "height": 1024,
+      "steps": 24,
+      "guidance_scale": 4.0,
+      "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
+      "baseline_vram_mib": 511,
+      "load_seconds": 22.4127801930008,
+      "vram_after_load_mib": 4111,
+      "vram_load_peak_mib": 4049,
+      "vram_generation_peak_mib": 8961,
+      "torch_peak_allocated_mib": 7798,
+      "runs": [
+        {
+          "prompt_id": "fern",
+          "seed": 424242,
+          "seconds": 4.61064092599554,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/fern_int8_seed_424242_1024x1024.png"
+        },
+        {
+          "prompt_id": "city",
+          "seed": 424243,
+          "seconds": 4.606765301999985,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/city_int8_seed_424243_1024x1024.png"
+        },
+        {
+          "prompt_id": "witch",
+          "seed": 424244,
+          "seconds": 4.597769348009024,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/witch_int8_seed_424244_1024x1024.png"
+        },
+        {
+          "prompt_id": "mecha",
+          "seed": 424245,
+          "seconds": 4.587051768990932,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/mecha_int8_seed_424245_1024x1024.png"
+        },
+        {
+          "prompt_id": "garden",
+          "seed": 424246,
+          "seconds": 4.616055713006062,
+          "image": "/root/anima-transformers-convert/benchmark_1024/images/garden_int8_seed_424246_1024x1024.png"
+        }
+      ],
+      "mean_generation_seconds": 4.603656611600309,
+      "relative_to_original_speedup": 1.3841937431089992,
+      "vram_after_load_delta_vs_original_mib": -1894,
+      "vram_generation_peak_delta_vs_original_mib": -1798
+    }
+  ],
+  "grid": "/root/anima-transformers-convert/benchmark_1024/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg",
+  "grid_size": {
+    "width": 3572,
+    "height": 5576,
+    "cell_width": 1024,
+    "cell_height": 1024
+  }
+}

images/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg ADDED Viewed

Git LFS Details

SHA256: d908c9ac7a81c3decd66b86aaec1eff4405ab774e1d9e8884e3f2ab1de07c909
Pointer size: 132 Bytes
Size of remote file: 4.02 MB