Fix technical_report_fcdm_diffae.md benchmark methodology
Browse files- technical_report_fcdm_diffae.md +24 -12
technical_report_fcdm_diffae.md
CHANGED
|
@@ -170,18 +170,30 @@ The fixed 39-image set is exported separately as a reconstruction viewer with
|
|
| 170 |
side-by-side originals, semdisdiffae_p32_v2 reconstructions, error deltas, and
|
| 171 |
PCA visualizations. FLUX.2 VAE reconstructions are included for comparison.
|
| 172 |
|
| 173 |
-
## Throughput
|
| 174 |
-
|
| 175 |
-
Measured on an `NVIDIA GeForce RTX 5090` in `bfloat16`,
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
|
| 180 |
-
|
|
| 181 |
-
|
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 185 |
|
| 186 |
## VP Stability
|
| 187 |
|
|
|
|
| 170 |
side-by-side originals, semdisdiffae_p32_v2 reconstructions, error deltas, and
|
| 171 |
PCA visualizations. FLUX.2 VAE reconstructions are included for comparison.
|
| 172 |
|
| 173 |
+
## Encode Throughput
|
| 174 |
+
|
| 175 |
+
Measured on an `NVIDIA GeForce RTX 5090` in `bfloat16`, averaging `20`
|
| 176 |
+
repeated batched `encode()` calls after `5` warmup batches.
|
| 177 |
+
|
| 178 |
+
| Resolution | Batch Size | Mean (ms/batch) | Median (ms/batch) | P95 (ms/batch) | ms/image | Images/s | Peak Allocated VRAM |
|
| 179 |
+
|---:|---:|---:|---:|---:|---:|---:|---:|
|
| 180 |
+
| `256x256` | `128` | `12.54` | `12.52` | `12.86` | `0.098` | `10206.3` | `567.8 MiB` |
|
| 181 |
+
| `512x512` | `32` | `12.09` | `12.12` | `12.33` | `0.378` | `2647.2` | `563.8 MiB` |
|
| 182 |
+
|
| 183 |
+
## Decode Latency
|
| 184 |
+
|
| 185 |
+
Measured on the same `NVIDIA GeForce RTX 5090` in `bfloat16`. This is
|
| 186 |
+
decode-only latency: `20` images are encoded once per resolution, latents are
|
| 187 |
+
cached, and timing is sequential batch-1 `decode()` over the cached latent set
|
| 188 |
+
with the default 1-step sampler and PDG disabled. This follows the
|
| 189 |
+
`capacitor_decoder` decode benchmark methodology and does not include encode
|
| 190 |
+
time.
|
| 191 |
+
|
| 192 |
+
| Resolution | Batch Size | Images | Mean (ms/image) | Median (ms/image) | P95 (ms/image) | Images/s | Peak Allocated VRAM |
|
| 193 |
+
|---:|---:|---:|---:|---:|---:|---:|---:|
|
| 194 |
+
| `512x512` | `1` | `20` | `5.11` | `5.10` | `5.27` | `195.6` | `340.8 MiB` |
|
| 195 |
+
| `1024x1024` | `1` | `20` | `10.14` | `10.16` | `10.22` | `98.6` | `409.6 MiB` |
|
| 196 |
+
| `2048x2048` | `1` | `20` | `53.86` | `53.95` | `53.98` | `18.6` | `720.9 MiB` |
|
| 197 |
|
| 198 |
## VP Stability
|
| 199 |
|