--- license: apache-2.0 tags: - diffusion - autoencoder - image-reconstruction - pytorch library_name: irdiffae --- # data-archetype/irdiffae-v1 **iRDiffAE** — **iR**epa **Diff**usion **A**uto**E**ncoder. A fast, single-GPU-trainable diffusion autoencoder with spatially structured latents for rapid downstream model convergence. Encoding runs ~5× faster than Flux VAE; single-step decoding runs ~3× faster. ## Model Variants | Variant | Patch | Channels | Compression | | |---------|-------|----------|-------------|---| | [irdiffae_v1](https://huggingface.co/data-archetype/irdiffae_v1) | 16x16 | 128 | 6x | recommended | This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB. ## Documentation - [Technical Report](technical_report.md) — diffusion math, architecture, training, and results - [Results — interactive viewer](https://huggingface.co/spaces/data-archetype/irdiffae-results) — full-resolution side-by-side comparison - [Results — summary stats](technical_report.md#7-results) — metrics and per-image PSNR ## Quick Start ```python import torch from ir_diffae import IRDiffAE # Load from HuggingFace Hub (or a local path) model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda") # Encode images = ... # [B, 3, H, W] in [-1, 1], H and W divisible by 16 latents = model.encode(images) # Decode (1 step by default — PSNR-optimal) recon = model.decode(latents, height=H, width=W) # Reconstruct (encode + 1-step decode) recon = model.reconstruct(images) ``` > **Note:** Requires `pip install huggingface_hub safetensors` for Hub downloads. > You can also pass a local directory path to `from_pretrained()`. ## Architecture | Property | Value | |---|---| | Parameters | 120,957,440 | | File size | 461.4 MB | | Patch size | 16 | | Model dim | 896 | | Encoder depth | 4 | | Decoder depth | 8 | | Bottleneck dim | 128 | | MLP ratio | 4.0 | | Depthwise kernel | 7 | | AdaLN rank | 128 | **Encoder**: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with learned residual gates. **Decoder**: VP diffusion conditioned on encoder latents and timestep via shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle blocks (4) -> skip fusion -> end blocks (2). Supports Path-Drop Guidance (PDG) at inference for quality/speed tradeoff. ## Recommended Settings Best quality is achieved with just **1 DDIM step** and PDG disabled, making inference extremely fast. PDG (strength 2-4) can optionally increase perceptual sharpness but is easy to overdo. | Setting | Default | |---|---| | Sampler | DDIM | | Steps | 1 | | PDG | Disabled | ```python from ir_diffae import IRDiffAEInferenceConfig # PSNR-optimal (fast, 1 step) cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim") recon = model.decode(latents, height=H, width=W, inference_config=cfg) ``` ## Citation ```bibtex @misc{irdiffae_v1, title = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks}, author = {data-archetype}, year = {2026}, month = feb, url = {https://huggingface.co/data-archetype/irdiffae-v1}, } ``` ## Dependencies - PyTorch >= 2.0 - safetensors (for loading weights) ## License Apache 2.0