data-archetype commited on
Commit
3fd2fb9
·
verified ·
1 Parent(s): d036923

Fix technical_report_fcdm_diffae.md benchmark methodology

Browse files
Files changed (1) hide show
  1. technical_report_fcdm_diffae.md +24 -12
technical_report_fcdm_diffae.md CHANGED
@@ -170,18 +170,30 @@ The fixed 39-image set is exported separately as a reconstruction viewer with
170
  side-by-side originals, semdisdiffae_p32_v2 reconstructions, error deltas, and
171
  PCA visualizations. FLUX.2 VAE reconstructions are included for comparison.
172
 
173
- ## Throughput
174
-
175
- Measured on an `NVIDIA GeForce RTX 5090` in `bfloat16`, with `5` warmup batches
176
- and `20` timed batches. Decode uses the default 1-step sampler with PDG
177
- disabled.
178
-
179
- | Operation | Resolution | Batch Size | Mean (ms/batch) | Median (ms/batch) | P95 (ms/batch) | ms/image | Images/s | Peak Allocated VRAM |
180
- |---|---:|---:|---:|---:|---:|---:|---:|---:|
181
- | Encode | `256x256` | `128` | `12.57` | `12.41` | `13.27` | `0.098` | `10186.8` | `574 MiB` |
182
- | Decode | `256x256` | `128` | `98.93` | `99.22` | `100.16` | `0.773` | `1293.9` | `1042 MiB` |
183
- | Encode | `512x512` | `32` | `12.08` | `11.98` | `12.46` | `0.377` | `2649.9` | `579 MiB` |
184
- | Decode | `512x512` | `32` | `100.36` | `99.39` | `105.84` | `3.136` | `318.8` | `1042 MiB` |
 
 
 
 
 
 
 
 
 
 
 
 
185
 
186
  ## VP Stability
187
 
 
170
  side-by-side originals, semdisdiffae_p32_v2 reconstructions, error deltas, and
171
  PCA visualizations. FLUX.2 VAE reconstructions are included for comparison.
172
 
173
+ ## Encode Throughput
174
+
175
+ Measured on an `NVIDIA GeForce RTX 5090` in `bfloat16`, averaging `20`
176
+ repeated batched `encode()` calls after `5` warmup batches.
177
+
178
+ | Resolution | Batch Size | Mean (ms/batch) | Median (ms/batch) | P95 (ms/batch) | ms/image | Images/s | Peak Allocated VRAM |
179
+ |---:|---:|---:|---:|---:|---:|---:|---:|
180
+ | `256x256` | `128` | `12.54` | `12.52` | `12.86` | `0.098` | `10206.3` | `567.8 MiB` |
181
+ | `512x512` | `32` | `12.09` | `12.12` | `12.33` | `0.378` | `2647.2` | `563.8 MiB` |
182
+
183
+ ## Decode Latency
184
+
185
+ Measured on the same `NVIDIA GeForce RTX 5090` in `bfloat16`. This is
186
+ decode-only latency: `20` images are encoded once per resolution, latents are
187
+ cached, and timing is sequential batch-1 `decode()` over the cached latent set
188
+ with the default 1-step sampler and PDG disabled. This follows the
189
+ `capacitor_decoder` decode benchmark methodology and does not include encode
190
+ time.
191
+
192
+ | Resolution | Batch Size | Images | Mean (ms/image) | Median (ms/image) | P95 (ms/image) | Images/s | Peak Allocated VRAM |
193
+ |---:|---:|---:|---:|---:|---:|---:|---:|
194
+ | `512x512` | `1` | `20` | `5.11` | `5.10` | `5.27` | `195.6` | `340.8 MiB` |
195
+ | `1024x1024` | `1` | `20` | `10.14` | `10.16` | `10.22` | `98.6` | `409.6 MiB` |
196
+ | `2048x2048` | `1` | `20` | `53.86` | `53.95` | `53.98` | `18.6` | `720.9 MiB` |
197
 
198
  ## VP Stability
199