File size: 3,715 Bytes

---
license: apache-2.0
tags:
  - diffusion
  - autoencoder
  - image-reconstruction
  - latent-space
  - pytorch
---

# data-archetype/full_capacitor

**full_capacitor** distills the FLUX.2 latent space onto the
[SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae)
architecture. It is trained in two stages: first the
[Capacitor decoder](https://huggingface.co/data-archetype/capacitor_decoder)
is trained to decode FLUX.2 latents, then that decoder is frozen and a matching
encoder is trained on top and latents are regressed against FLUX.2 to produce a
standalone autoencoder.

## 2k PSNR Benchmark

| Model | Mean PSNR (dB) | Std (dB) | Median (dB) | P5 (dB) | P95 (dB) |
|---|---:|---:|---:|---:|---:|
| FLUX.2 VAE | `36.28` | `4.53` | `36.07` | `28.90` | `43.63` |
| full_capacitor | `36.62` | `4.63` | `36.55` | `29.14` | `44.05` |
| Delta | `+0.34` | `0.68` | `0.41` | `-0.85` | `1.31` |

Evaluated on `2000` validation images.

## Encode Throughput

Measured on an `NVIDIA GeForce RTX 5090` in `bfloat16`, averaging `20`
repeated batches per resolution.

| Resolution | Batch Size | FLUX.2 encode (ms/batch) | full_capacitor encode (ms/batch) | Speedup vs FLUX.2 | Peak VRAM Reduction |
|---:|---:|---:|---:|---:|---:|
| `256x256` | `128` | `383.41` | `42.56` | `9.01x` | `91.9%` |
| `512x512` | `32` | `353.58` | `44.97` | `7.86x` | `92.0%` |

Latent alignment is not perfect (posterior-mean cosine similarity about `95%`;
see [Technical report](https://huggingface.co/data-archetype/full_capacitor/blob/main/technical_report_full_capacitor.md)),
but latent PCA is very close (see
[Results viewer](https://huggingface.co/spaces/data-archetype/full_capacitor-results)).

## Latent Interface

- `encode()` returns the model's own whitened latent space.
- `decode()` expects that same whitened latent space and dewhitens internally.
- `whiten()` and `dewhiten()` are also exposed for explicit control.
- `encode_posterior()` returns the raw exported posterior `(mean, logsnr)` before whitening.

This latent interface is self-consistent for downstream latent diffusion, but
it is not a drop-in replacement for other models' latent normalization
conventions.

The export ships weights in `float32`. The recommended runtime path is
`bfloat16` for the main encoder and decoder, while whitening, dewhitening, and
other numerically sensitive inference steps remain in `float32`.

## Usage

```python
import torch

from full_capacitor import FullCapacitor, FullCapacitorInferenceConfig


device = "cuda"
model = FullCapacitor.from_pretrained(
    "data-archetype/full_capacitor",
    device=device,
    dtype=torch.bfloat16,
)

image = ...  # [1, 3, H, W] in [-1, 1], H and W divisible by 32

with torch.inference_mode():
    latents = model.encode(image.to(device=device, dtype=torch.bfloat16))
    recon = model.decode(
        latents,
        height=int(image.shape[-2]),
        width=int(image.shape[-1]),
        inference_config=FullCapacitorInferenceConfig(num_steps=1),
    )
```

## Details

- `full_capacitor` uses an `8`-block encoder and an `8`-block decoder.
- Raw-space cross checks show the latent spaces remain broadly compatible, but
  moving from one to the other should still require some
  adaptation time for downstream latent diffusion.
- [Technical report](https://huggingface.co/data-archetype/full_capacitor/blob/main/technical_report_full_capacitor.md)

## Citation

```bibtex
@misc{full_capacitor,
  title   = {Full capacitor: a Flux.2 VAE latent space distillation diffusion autoencoder},
  author  = {data-archetype},
  email   = {data-archetype@proton.me},
  year    = {2026},
  month   = apr,
  url     = {https://huggingface.co/data-archetype/full_capacitor},
}
```