data-archetype/irdiffae-v1

DEPRECATED — This model is superseded by SemDisDiffAE, which offers better reconstruction quality, better downstream diffusion convergence, and slightly faster inference.

iRDiffAE — iRepa Diffusion AutoEncoder. A fast, single-GPU-trainable diffusion autoencoder with spatially structured latents for rapid downstream model convergence. Encoding runs ~5× faster than Flux VAE; single-step decoding runs ~3× faster.

Model Variants

Variant	Patch	Channels	Compression
irdiffae_v1	16x16	128	6x	recommended

This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB.

Documentation

Technical Report — diffusion math, architecture, training, and results
Results — interactive viewer — full-resolution side-by-side comparison
Results — summary stats — metrics and per-image PSNR

Quick Start

import torch
from ir_diffae import IRDiffAE

# Load from HuggingFace Hub (or a local path)
model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda")

# Encode
images = ...  # [B, 3, H, W] in [-1, 1], H and W divisible by 16
latents = model.encode(images)

# Decode (1 step by default — PSNR-optimal)
recon = model.decode(latents, height=H, width=W)

# Reconstruct (encode + 1-step decode)
recon = model.reconstruct(images)

Note: Requires pip install huggingface_hub safetensors for Hub downloads. You can also pass a local directory path to from_pretrained().

Architecture

Property	Value
Parameters	120,957,440
File size	461.4 MB
Patch size	16
Model dim	896
Encoder depth	4
Decoder depth	8
Bottleneck dim	128
MLP ratio	4.0
Depthwise kernel	7
AdaLN rank	128

Encoder: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with learned residual gates.

Decoder: VP diffusion conditioned on encoder latents and timestep via shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle blocks (4) -> skip fusion -> end blocks (2). Supports Path-Drop Guidance (PDG) at inference for quality/speed tradeoff.

Recommended Settings

Best quality is achieved with just 1 DDIM step and PDG disabled, making inference extremely fast. PDG (strength 2-4) can optionally increase perceptual sharpness but is easy to overdo.

Setting	Default
Sampler	DDIM
Steps	1
PDG	Disabled

from ir_diffae import IRDiffAEInferenceConfig

# PSNR-optimal (fast, 1 step)
cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim")
recon = model.decode(latents, height=H, width=W, inference_config=cfg)

Citation

@misc{irdiffae_v1,
  title   = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks},
  author  = {data-archetype},
  year    = {2026},
  month   = feb,
  url     = {https://huggingface.co/data-archetype/irdiffae-v1},
}

Dependencies

PyTorch >= 2.0
safetensors (for loading weights)

License

Apache 2.0

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support