WaveCut commited on
Commit
6a2d96d
·
verified ·
1 Parent(s): 0febcbd

Add 1024 benchmark grid and stats

Browse files
.gitattributes CHANGED
@@ -30,3 +30,4 @@ text_encoder/tokenizer.json filter=lfs diff=lfs merge=lfs -text
30
  tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
31
  images/anima_sdnq_pair_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text
32
  images/anima_uint4_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text
 
 
30
  tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
31
  images/anima_sdnq_pair_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text
32
  images/anima_uint4_seed_424242_768x768.png filter=lfs diff=lfs merge=lfs -text
33
+ images/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -17,9 +17,9 @@ tags:
17
 
18
  # Anima Preview 3 SDNQ UINT4 Diffusers Checkpoint
19
 
20
- 4-bit uint4 static SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline.
21
 
22
- This repository is a separate full Diffusers checkpoint for `circlestone-labs/Anima` Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`. The `transformer/` component is the WaveCut SDNQ-quantized diffusion transformer.
23
 
24
  ## Components
25
 
@@ -45,8 +45,8 @@ pipe = DiffusionPipeline.from_pretrained(
45
  trust_remote_code=True,
46
  ).to("cuda")
47
 
48
- prompt = 'masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer'
49
- negative_prompt = 'worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name'
50
 
51
  image = pipe(
52
  prompt=prompt,
@@ -61,7 +61,9 @@ image = pipe(
61
 
62
  ## Prompting
63
 
64
- Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. Recommended positive prefix:
 
 
65
 
66
  ```text
67
  masterpiece, best quality, score_7, safe,
@@ -75,25 +77,25 @@ worst quality, low quality, score_1, score_2, score_3, artist name
75
 
76
  Use lowercase tags with spaces instead of underscores, except score tags such as `score_7`. For artist tags, prefix the artist with `@`.
77
 
 
78
 
79
- ## Sample Output
80
 
81
- Comparison generated from the public Diffusers checkpoints:
82
 
83
- ![Anima SDNQ uint4 vs int8 comparison](images/anima_sdnq_pair_seed_424242_768x768.png)
84
 
85
- This checkpoint sample:
86
 
87
- ![Anima SDNQ uint4 sample](images/anima_uint4_seed_424242_768x768.png)
88
 
89
- Generation settings:
 
 
 
 
90
 
91
- - Prompt: `masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer`
92
- - Negative prompt: `worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name`
93
- - Seed: `424242`
94
- - Size: `768x768`
95
- - Steps: `24`
96
- - CFG: `4.0`
97
 
98
  ## Notes
99
 
 
17
 
18
  # Anima Preview 3 SDNQ UINT4 Diffusers Checkpoint
19
 
20
+ 4-bit uint4 static SDNQ quantization of the Anima Preview 3 diffusion transformer, packaged as a full Diffusers pipeline. This is the smallest checkpoint and lowest VRAM footprint in this comparison; the companion checkpoints are listed in the benchmark table below.
21
 
22
+ This repository is a separate full Diffusers checkpoint for `circlestone-labs/Anima` Preview 3. The pipeline code and non-transformer components are based on the public Diffusers conversion `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers`. The `transformer/` component is the WaveCut SDNQ-quantized diffusion transformer converted from `WaveCut/Anima-Preview-3-SDNQ-uint4`.
23
 
24
  ## Components
25
 
 
45
  trust_remote_code=True,
46
  ).to("cuda")
47
 
48
+ prompt = "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer"
49
+ negative_prompt = "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name"
50
 
51
  image = pipe(
52
  prompt=prompt,
 
61
 
62
  ## Prompting
63
 
64
+ Anima was trained on Danbooru-style tags, natural language captions, and mixtures of both. The upstream Anima Preview 3 card recommends about 1MP generation, for example `1024x1024`, `896x1152`, or `1152x896`, with roughly 30-50 steps and CFG 4-5.
65
+
66
+ Recommended positive prefix:
67
 
68
  ```text
69
  masterpiece, best quality, score_7, safe,
 
77
 
78
  Use lowercase tags with spaces instead of underscores, except score tags such as `score_7`. For artist tags, prefix the artist with `@`.
79
 
80
+ ## 1024x1024 Comparison Grid
81
 
82
+ Five prompt/seed pairs were generated with the original BF16 Diffusers checkpoint, this UINT4 checkpoint, and the companion INT8 checkpoint. The source JPEG is `3572x5576`; every generated cell is exactly `1024x1024` and pasted 1:1 with no resizing.
83
 
84
+ ![Anima Original BF16 vs SDNQ UINT4 and INT8 1024x1024 grid](images/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg)
85
 
86
+ Prompt IDs and seeds are printed in the left column of the grid. Raw benchmark data is available in [`benchmarks/benchmark_results_1024.json`](benchmarks/benchmark_results_1024.json).
87
 
88
+ ## Benchmark
89
 
90
+ Measured on an RTX 5090 32GB with `torch 2.8.0+cu128`, `diffusers 0.38.0`, `transformers 5.8.1`, `sdnq 0.1.8`, `torch.bfloat16`, 24 steps, CFG 4.0, and 1024x1024 output. Network download is excluded. Each model was loaded in a separate process; one 1024x1024 warm-up image was discarded, then five prompt/seed pairs were measured. VRAM was sampled with `nvidia-smi` every 50 ms.
91
 
92
+ | Model | Repo | Size | Load time | Mean generation | Speed vs original | VRAM after load | Peak VRAM while generating |
93
+ | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
94
+ | Original BF16 | `CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers` | 5.3 GiB | 10.04s | 6.37s/img | 1.00x | 6005 MiB | 10759 MiB |
95
+ | SDNQ UINT4 | `WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers` | 2.7 GiB (-49.1%) | 11.96s | 6.13s/img | 1.04x (+3.9%) | 3285 MiB (-45.3%) | 8157 MiB (-24.2%) |
96
+ | SDNQ INT8 | `WaveCut/Anima-Preview-3-SDNQ-int8-diffusers` | 3.5 GiB (-34.1%) | 22.41s | 4.60s/img | 1.38x (+38.4%) | 4111 MiB (-31.5%) | 8961 MiB (-16.7%) |
97
 
98
+ Quant-to-quant tradeoff in this run: UINT4 is 22.7% smaller than INT8 and uses 826 MiB less VRAM after load plus 804 MiB less peak generation VRAM. INT8 is 1.33x faster than UINT4 on this RTX 5090 setup.
 
 
 
 
 
99
 
100
  ## Notes
101
 
benchmarks/benchmark_results_1024.json ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "hardware": "NVIDIA GeForce RTX 5090 32GB",
3
+ "software": {
4
+ "torch": "2.8.0+cu128",
5
+ "diffusers": "0.38.0",
6
+ "transformers": "5.8.1",
7
+ "sdnq": "0.1.8"
8
+ },
9
+ "benchmark_note": "Network download excluded. One 1024x1024 warm-up generation per model, then five measured 1024x1024 generations. VRAM sampled with nvidia-smi every 50 ms in an isolated process per model.",
10
+ "width": 1024,
11
+ "height": 1024,
12
+ "steps": 24,
13
+ "guidance_scale": 4.0,
14
+ "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
15
+ "prompts": [
16
+ {
17
+ "id": "fern",
18
+ "seed": 424242,
19
+ "prompt": "masterpiece, best quality, score_7, safe, 1girl, fern (sousou no frieren), purple hair, purple eyes, black robe, white dress, butterfly on hand, simple background, looking at viewer"
20
+ },
21
+ {
22
+ "id": "city",
23
+ "seed": 424243,
24
+ "prompt": "masterpiece, best quality, score_7, safe, anime screenshot, 1girl, short black hair, red jacket, standing on a rainy neon city street at night, reflections, cinematic lighting"
25
+ },
26
+ {
27
+ "id": "witch",
28
+ "seed": 424244,
29
+ "prompt": "masterpiece, best quality, score_7, safe, 1girl, witch hat, silver hair, blue eyes, starry sky, floating books, glowing magic circle, detailed illustration"
30
+ },
31
+ {
32
+ "id": "mecha",
33
+ "seed": 424245,
34
+ "prompt": "masterpiece, best quality, score_7, safe, 1boy, pilot suit, white mecha in the background, sunset hangar, dramatic rim light, anime key visual"
35
+ },
36
+ {
37
+ "id": "garden",
38
+ "seed": 424246,
39
+ "prompt": "masterpiece, best quality, score_7, safe, 2girls, summer dresses, flower garden, butterflies, warm sunlight, soft watercolor anime style"
40
+ }
41
+ ],
42
+ "models": [
43
+ {
44
+ "key": "original",
45
+ "title": "Original BF16",
46
+ "path": "/root/anima-transformers-convert/original-full",
47
+ "repo": "CalamitousFelicitousness/Anima-Preview-3-sdnext-diffusers",
48
+ "hardware": "NVIDIA GeForce RTX 5090 32GB",
49
+ "dtype": "torch.bfloat16",
50
+ "width": 1024,
51
+ "height": 1024,
52
+ "steps": 24,
53
+ "guidance_scale": 4.0,
54
+ "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
55
+ "baseline_vram_mib": 511,
56
+ "load_seconds": 10.04116036000778,
57
+ "vram_after_load_mib": 6005,
58
+ "vram_load_peak_mib": 6005,
59
+ "vram_generation_peak_mib": 10759,
60
+ "torch_peak_allocated_mib": 9669,
61
+ "runs": [
62
+ {
63
+ "prompt_id": "fern",
64
+ "seed": 424242,
65
+ "seconds": 6.371356149989879,
66
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/fern_original_seed_424242_1024x1024.png"
67
+ },
68
+ {
69
+ "prompt_id": "city",
70
+ "seed": 424243,
71
+ "seconds": 6.3718316220038105,
72
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/city_original_seed_424243_1024x1024.png"
73
+ },
74
+ {
75
+ "prompt_id": "witch",
76
+ "seed": 424244,
77
+ "seconds": 6.374521128003835,
78
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/witch_original_seed_424244_1024x1024.png"
79
+ },
80
+ {
81
+ "prompt_id": "mecha",
82
+ "seed": 424245,
83
+ "seconds": 6.371869497001171,
84
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/mecha_original_seed_424245_1024x1024.png"
85
+ },
86
+ {
87
+ "prompt_id": "garden",
88
+ "seed": 424246,
89
+ "seconds": 6.372184988998924,
90
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/garden_original_seed_424246_1024x1024.png"
91
+ }
92
+ ],
93
+ "mean_generation_seconds": 6.372352677199524,
94
+ "relative_to_original_speedup": 1.0,
95
+ "vram_after_load_delta_vs_original_mib": 0,
96
+ "vram_generation_peak_delta_vs_original_mib": 0
97
+ },
98
+ {
99
+ "key": "uint4",
100
+ "title": "SDNQ UINT4",
101
+ "path": "/root/anima-transformers-convert/full/Anima-SDNQ-uint4-diffusers",
102
+ "repo": "WaveCut/Anima-Preview-3-SDNQ-uint4-diffusers",
103
+ "hardware": "NVIDIA GeForce RTX 5090 32GB",
104
+ "dtype": "torch.bfloat16",
105
+ "width": 1024,
106
+ "height": 1024,
107
+ "steps": 24,
108
+ "guidance_scale": 4.0,
109
+ "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
110
+ "baseline_vram_mib": 511,
111
+ "load_seconds": 11.955643722001696,
112
+ "vram_after_load_mib": 3285,
113
+ "vram_load_peak_mib": 3181,
114
+ "vram_generation_peak_mib": 8157,
115
+ "torch_peak_allocated_mib": 6971,
116
+ "runs": [
117
+ {
118
+ "prompt_id": "fern",
119
+ "seed": 424242,
120
+ "seconds": 6.849568051999086,
121
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/fern_uint4_seed_424242_1024x1024.png"
122
+ },
123
+ {
124
+ "prompt_id": "city",
125
+ "seed": 424243,
126
+ "seconds": 5.868479846001719,
127
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/city_uint4_seed_424243_1024x1024.png"
128
+ },
129
+ {
130
+ "prompt_id": "witch",
131
+ "seed": 424244,
132
+ "seconds": 6.189502780995099,
133
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/witch_uint4_seed_424244_1024x1024.png"
134
+ },
135
+ {
136
+ "prompt_id": "mecha",
137
+ "seed": 424245,
138
+ "seconds": 5.836763394996524,
139
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/mecha_uint4_seed_424245_1024x1024.png"
140
+ },
141
+ {
142
+ "prompt_id": "garden",
143
+ "seed": 424246,
144
+ "seconds": 5.911209135010722,
145
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/garden_uint4_seed_424246_1024x1024.png"
146
+ }
147
+ ],
148
+ "mean_generation_seconds": 6.13110464180063,
149
+ "relative_to_original_speedup": 1.0393482169190384,
150
+ "vram_after_load_delta_vs_original_mib": -2720,
151
+ "vram_generation_peak_delta_vs_original_mib": -2602
152
+ },
153
+ {
154
+ "key": "int8",
155
+ "title": "SDNQ INT8",
156
+ "path": "/root/anima-transformers-convert/full/Anima-SDNQ-int8-diffusers",
157
+ "repo": "WaveCut/Anima-Preview-3-SDNQ-int8-diffusers",
158
+ "hardware": "NVIDIA GeForce RTX 5090 32GB",
159
+ "dtype": "torch.bfloat16",
160
+ "width": 1024,
161
+ "height": 1024,
162
+ "steps": 24,
163
+ "guidance_scale": 4.0,
164
+ "negative_prompt": "worst quality, low quality, score_1, score_2, score_3, blurry, jpeg artifacts, artist name",
165
+ "baseline_vram_mib": 511,
166
+ "load_seconds": 22.4127801930008,
167
+ "vram_after_load_mib": 4111,
168
+ "vram_load_peak_mib": 4049,
169
+ "vram_generation_peak_mib": 8961,
170
+ "torch_peak_allocated_mib": 7798,
171
+ "runs": [
172
+ {
173
+ "prompt_id": "fern",
174
+ "seed": 424242,
175
+ "seconds": 4.61064092599554,
176
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/fern_int8_seed_424242_1024x1024.png"
177
+ },
178
+ {
179
+ "prompt_id": "city",
180
+ "seed": 424243,
181
+ "seconds": 4.606765301999985,
182
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/city_int8_seed_424243_1024x1024.png"
183
+ },
184
+ {
185
+ "prompt_id": "witch",
186
+ "seed": 424244,
187
+ "seconds": 4.597769348009024,
188
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/witch_int8_seed_424244_1024x1024.png"
189
+ },
190
+ {
191
+ "prompt_id": "mecha",
192
+ "seed": 424245,
193
+ "seconds": 4.587051768990932,
194
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/mecha_int8_seed_424245_1024x1024.png"
195
+ },
196
+ {
197
+ "prompt_id": "garden",
198
+ "seed": 424246,
199
+ "seconds": 4.616055713006062,
200
+ "image": "/root/anima-transformers-convert/benchmark_1024/images/garden_int8_seed_424246_1024x1024.png"
201
+ }
202
+ ],
203
+ "mean_generation_seconds": 4.603656611600309,
204
+ "relative_to_original_speedup": 1.3841937431089992,
205
+ "vram_after_load_delta_vs_original_mib": -1894,
206
+ "vram_generation_peak_delta_vs_original_mib": -1798
207
+ }
208
+ ],
209
+ "grid": "/root/anima-transformers-convert/benchmark_1024/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg",
210
+ "grid_size": {
211
+ "width": 3572,
212
+ "height": 5576,
213
+ "cell_width": 1024,
214
+ "cell_height": 1024
215
+ }
216
+ }
images/anima_original_uint4_int8_grid_5x3_1024x1024_1to1.jpg ADDED

Git LFS Details

  • SHA256: d908c9ac7a81c3decd66b86aaec1eff4405ab774e1d9e8884e3f2ab1de07c909
  • Pointer size: 132 Bytes
  • Size of remote file: 4.02 MB