--- license: apache-2.0 tags: - diffusion - autoencoder - image-reconstruction - latent-space - pytorch --- # data-archetype/full_capacitor **full_capacitor** distills the FLUX.2 latent space onto the [SemDisDiffAE](https://huggingface.co/data-archetype/semdisdiffae) architecture. It is trained in two stages: first the [Capacitor decoder](https://huggingface.co/data-archetype/capacitor_decoder) is trained to decode FLUX.2 latents, then that decoder is frozen and a matching encoder is trained on top and latents are regressed against FLUX.2 to produce a standalone autoencoder. ## 2k PSNR Benchmark | Model | Mean PSNR (dB) | Std (dB) | Median (dB) | P5 (dB) | P95 (dB) | |---|---:|---:|---:|---:|---:| | FLUX.2 VAE | `36.28` | `4.53` | `36.07` | `28.90` | `43.63` | | full_capacitor | `36.62` | `4.63` | `36.55` | `29.14` | `44.05` | | Delta | `+0.34` | `0.68` | `0.41` | `-0.85` | `1.31` | Evaluated on `2000` validation images. ## Encode Throughput Measured on an `NVIDIA GeForce RTX 5090` in `bfloat16`, averaging `20` repeated batches per resolution. | Resolution | Batch Size | FLUX.2 encode (ms/batch) | full_capacitor encode (ms/batch) | Speedup vs FLUX.2 | Peak VRAM Reduction | |---:|---:|---:|---:|---:|---:| | `256x256` | `128` | `383.41` | `42.56` | `9.01x` | `91.9%` | | `512x512` | `32` | `353.58` | `44.97` | `7.86x` | `92.0%` | Latent alignment is not perfect (posterior-mean cosine similarity about `95%`; see [Technical report](https://huggingface.co/data-archetype/full_capacitor/blob/main/technical_report_full_capacitor.md)), but latent PCA is very close (see [Results viewer](https://huggingface.co/spaces/data-archetype/full_capacitor-results)). ## Latent Interface - `encode()` returns the model's own whitened latent space. - `decode()` expects that same whitened latent space and dewhitens internally. - `whiten()` and `dewhiten()` are also exposed for explicit control. - `encode_posterior()` returns the raw exported posterior `(mean, logsnr)` before whitening. This latent interface is self-consistent for downstream latent diffusion, but it is not a drop-in replacement for other models' latent normalization conventions. The export ships weights in `float32`. The recommended runtime path is `bfloat16` for the main encoder and decoder, while whitening, dewhitening, and other numerically sensitive inference steps remain in `float32`. ## Usage ```python import torch from full_capacitor import FullCapacitor, FullCapacitorInferenceConfig device = "cuda" model = FullCapacitor.from_pretrained( "data-archetype/full_capacitor", device=device, dtype=torch.bfloat16, ) image = ... # [1, 3, H, W] in [-1, 1], H and W divisible by 32 with torch.inference_mode(): latents = model.encode(image.to(device=device, dtype=torch.bfloat16)) recon = model.decode( latents, height=int(image.shape[-2]), width=int(image.shape[-1]), inference_config=FullCapacitorInferenceConfig(num_steps=1), ) ``` ## Details - `full_capacitor` uses an `8`-block encoder and an `8`-block decoder. - Raw-space cross checks show the latent spaces remain broadly compatible, but moving from one to the other should still require some adaptation time for downstream latent diffusion. - [Technical report](https://huggingface.co/data-archetype/full_capacitor/blob/main/technical_report_full_capacitor.md) ## Citation ```bibtex @misc{full_capacitor, title = {Full capacitor: a Flux.2 VAE latent space distillation diffusion autoencoder}, author = {data-archetype}, email = {data-archetype@proton.me}, year = {2026}, month = apr, url = {https://huggingface.co/data-archetype/full_capacitor}, } ```