File size: 3,715 Bytes
40c91e7 c70ce92 40c91e7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 | ---
license: apache-2.0
tags:
- diffusion
- autoencoder
- image-reconstruction
- latent-space
- pytorch
---
# data-archetype/full_capacitor
**full_capacitor** distills the FLUX.2 latent space onto the
[SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
architecture. It is trained in two stages: first the
[Capacitor decoder](https://huggingface.co/data-archetype/capacitor_decoder)
is trained to decode FLUX.2 latents, then that decoder is frozen and a matching
encoder is trained on top and latents are regressed against FLUX.2 to produce a
standalone autoencoder.
## 2k PSNR Benchmark
| Model | Mean PSNR (dB) | Std (dB) | Median (dB) | P5 (dB) | P95 (dB) |
|---|---:|---:|---:|---:|---:|
| FLUX.2 VAE | `36.28` | `4.53` | `36.07` | `28.90` | `43.63` |
| full_capacitor | `36.62` | `4.63` | `36.55` | `29.14` | `44.05` |
| Delta | `+0.34` | `0.68` | `0.41` | `-0.85` | `1.31` |
Evaluated on `2000` validation images.
## Encode Throughput
Measured on an `NVIDIA GeForce RTX 5090` in `bfloat16`, averaging `20`
repeated batches per resolution.
| Resolution | Batch Size | FLUX.2 encode (ms/batch) | full_capacitor encode (ms/batch) | Speedup vs FLUX.2 | Peak VRAM Reduction |
|---:|---:|---:|---:|---:|---:|
| `256x256` | `128` | `383.41` | `42.56` | `9.01x` | `91.9%` |
| `512x512` | `32` | `353.58` | `44.97` | `7.86x` | `92.0%` |
Latent alignment is not perfect (posterior-mean cosine similarity about `95%`;
see [Technical report](https://huggingface.co/data-archetype/full_capacitor/blob/main/technical_report_full_capacitor.md)),
but latent PCA is very close (see
[Results viewer](https://huggingface.co/spaces/data-archetype/full_capacitor-results)).
## Latent Interface
- `encode()` returns the model's own whitened latent space.
- `decode()` expects that same whitened latent space and dewhitens internally.
- `whiten()` and `dewhiten()` are also exposed for explicit control.
- `encode_posterior()` returns the raw exported posterior `(mean, logsnr)` before whitening.
This latent interface is self-consistent for downstream latent diffusion, but
it is not a drop-in replacement for other models' latent normalization
conventions.
The export ships weights in `float32`. The recommended runtime path is
`bfloat16` for the main encoder and decoder, while whitening, dewhitening, and
other numerically sensitive inference steps remain in `float32`.
## Usage
```python
import torch
from full_capacitor import FullCapacitor, FullCapacitorInferenceConfig
device = "cuda"
model = FullCapacitor.from_pretrained(
"data-archetype/full_capacitor",
device=device,
dtype=torch.bfloat16,
)
image = ... # [1, 3, H, W] in [-1, 1], H and W divisible by 32
with torch.inference_mode():
latents = model.encode(image.to(device=device, dtype=torch.bfloat16))
recon = model.decode(
latents,
height=int(image.shape[-2]),
width=int(image.shape[-1]),
inference_config=FullCapacitorInferenceConfig(num_steps=1),
)
```
## Details
- `full_capacitor` uses an `8`-block encoder and an `8`-block decoder.
- Raw-space cross checks show the latent spaces remain broadly compatible, but
moving from one to the other should still require some
adaptation time for downstream latent diffusion.
- [Technical report](https://huggingface.co/data-archetype/full_capacitor/blob/main/technical_report_full_capacitor.md)
## Citation
```bibtex
@misc{full_capacitor,
title = {Full capacitor: a Flux.2 VAE latent space distillation diffusion autoencoder},
author = {data-archetype},
email = {data-archetype@proton.me},
year = {2026},
month = apr,
url = {https://huggingface.co/data-archetype/full_capacitor},
}
```
|