data-archetype/irdiffae-v1
iRDiffAE β iRepa Diffusion AutoEncoder. A fast, single-GPU-trainable diffusion autoencoder with spatially structured latents for rapid downstream model convergence. Encoding runs ~5Γ faster than Flux VAE; single-step decoding runs ~3Γ faster.
Model Variants
| Variant | Patch | Channels | Compression | |
|---|---|---|---|---|
| irdiffae_v1 | 16x16 | 128 | 6x | recommended |
This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB.
Documentation
- Technical Report β diffusion math, architecture, training, and results
- Results β interactive viewer β full-resolution side-by-side comparison
- Results β summary stats β metrics and per-image PSNR
Quick Start
import torch
from ir_diffae import IRDiffAE
# Load from HuggingFace Hub (or a local path)
model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda")
# Encode
images = ... # [B, 3, H, W] in [-1, 1], H and W divisible by 16
latents = model.encode(images)
# Decode (1 step by default β PSNR-optimal)
recon = model.decode(latents, height=H, width=W)
# Reconstruct (encode + 1-step decode)
recon = model.reconstruct(images)
Note: Requires
pip install huggingface_hub safetensorsfor Hub downloads. You can also pass a local directory path tofrom_pretrained().
Architecture
| Property | Value |
|---|---|
| Parameters | 120,957,440 |
| File size | 461.4 MB |
| Patch size | 16 |
| Model dim | 896 |
| Encoder depth | 4 |
| Decoder depth | 8 |
| Bottleneck dim | 128 |
| MLP ratio | 4.0 |
| Depthwise kernel | 7 |
| AdaLN rank | 128 |
Encoder: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with learned residual gates.
Decoder: VP diffusion conditioned on encoder latents and timestep via shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle blocks (4) -> skip fusion -> end blocks (2). Supports Path-Drop Guidance (PDG) at inference for quality/speed tradeoff.
Recommended Settings
Best quality is achieved with just 1 DDIM step and PDG disabled, making inference extremely fast. PDG (strength 2-4) can optionally increase perceptual sharpness but is easy to overdo.
| Setting | Default |
|---|---|
| Sampler | DDIM |
| Steps | 1 |
| PDG | Disabled |
from ir_diffae import IRDiffAEInferenceConfig
# PSNR-optimal (fast, 1 step)
cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim")
recon = model.decode(latents, height=H, width=W, inference_config=cfg)
Citation
@misc{irdiffae_v1,
title = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks},
author = {data-archetype},
year = {2026},
month = feb,
url = {https://huggingface.co/data-archetype/irdiffae-v1},
}
Dependencies
- PyTorch >= 2.0
- safetensors (for loading weights)
License
Apache 2.0
- Downloads last month
- 13