Update decoder benchmark docs for RTX 5090 and GH200
Browse files
README.md
CHANGED
|
@@ -18,15 +18,18 @@ architecture.
|
|
| 18 |
|
| 19 |
## Decode Speed
|
| 20 |
|
|
|
|
|
|
|
| 21 |
| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
|
| 22 |
|---:|---:|---:|---:|---:|---:|---:|
|
| 23 |
-
| `512x512` | `
|
| 24 |
-
| `1024x1024` | `
|
| 25 |
-
| `2048x2048` | `
|
| 26 |
|
| 27 |
-
These measurements are decode-only
|
| 28 |
-
|
| 29 |
-
timed over the same cached latent
|
|
|
|
| 30 |
|
| 31 |
## 2k PSNR Benchmark
|
| 32 |
|
|
@@ -124,4 +127,3 @@ upstream and call `decode(..., latents_are_flux2_whitened=False)`.
|
|
| 124 |
url = {https://huggingface.co/data-archetype/capacitor_decoder},
|
| 125 |
}
|
| 126 |
```
|
| 127 |
-
|
|
|
|
| 18 |
|
| 19 |
## Decode Speed
|
| 20 |
|
| 21 |
+
### RTX 5090
|
| 22 |
+
|
| 23 |
| Resolution | Speedup vs FLUX.2 | Peak VRAM Reduction | capacitor_decoder (ms/image) | FLUX.2 VAE (ms/image) | capacitor_decoder peak VRAM | FLUX.2 peak VRAM |
|
| 24 |
|---:|---:|---:|---:|---:|---:|---:|
|
| 25 |
+
| `512x512` | `6.15x` | `61.5%` | `3.89` | `23.94` | `356.2 MiB` | `925.5 MiB` |
|
| 26 |
+
| `1024x1024` | `11.98x` | `80.8%` | `9.86` | `118.19` | `540.2 MiB` | `2815.2 MiB` |
|
| 27 |
+
| `2048x2048` | `10.81x` | `87.7%` | `52.12` | `563.28` | `1277.8 MiB` | `10371.8 MiB` |
|
| 28 |
|
| 29 |
+
These measurements are decode-only and were run on an `NVIDIA GeForce RTX 5090`.
|
| 30 |
+
Each image is first encoded once with the same FLUX.2 encoder, latents are
|
| 31 |
+
cached in memory, and then both decoders are timed over the same cached latent
|
| 32 |
+
set.
|
| 33 |
|
| 34 |
## 2k PSNR Benchmark
|
| 35 |
|
|
|
|
| 127 |
url = {https://huggingface.co/data-archetype/capacitor_decoder},
|
| 128 |
}
|
| 129 |
```
|
|
|