irdiffae-v1 / README.md
data-archetype's picture
Upload folder using huggingface_hub
8d69668 verified
metadata
license: apache-2.0
tags:
  - diffusion
  - autoencoder
  - image-reconstruction
  - pytorch
library_name: irdiffae

data-archetype/irdiffae-v1

iRDiffAE — iRepa Diffusion AutoEncoder. A fast, single-GPU-trainable diffusion autoencoder with spatially structured latents for rapid downstream model convergence. Encoding runs ~5× faster than Flux VAE; single-step decoding runs ~3× faster.

Model Variants

Variant Patch Channels Compression
irdiffae_v1 16x16 128 6x recommended

This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB.

Documentation

Quick Start

import torch
from ir_diffae import IRDiffAE

# Load from HuggingFace Hub (or a local path)
model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda")

# Encode
images = ...  # [B, 3, H, W] in [-1, 1], H and W divisible by 16
latents = model.encode(images)

# Decode (1 step by default — PSNR-optimal)
recon = model.decode(latents, height=H, width=W)

# Reconstruct (encode + 1-step decode)
recon = model.reconstruct(images)

Note: Requires pip install huggingface_hub safetensors for Hub downloads. You can also pass a local directory path to from_pretrained().

Architecture

Property Value
Parameters 120,957,440
File size 461.4 MB
Patch size 16
Model dim 896
Encoder depth 4
Decoder depth 8
Bottleneck dim 128
MLP ratio 4.0
Depthwise kernel 7
AdaLN rank 128

Encoder: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with learned residual gates.

Decoder: VP diffusion conditioned on encoder latents and timestep via shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle blocks (4) -> skip fusion -> end blocks (2). Supports Path-Drop Guidance (PDG) at inference for quality/speed tradeoff.

Recommended Settings

Best quality is achieved with just 1 DDIM step and PDG disabled, making inference extremely fast. PDG (strength 2-4) can optionally increase perceptual sharpness but is easy to overdo.

Setting Default
Sampler DDIM
Steps 1
PDG Disabled
from ir_diffae import IRDiffAEInferenceConfig

# PSNR-optimal (fast, 1 step)
cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim")
recon = model.decode(latents, height=H, width=W, inference_config=cfg)

Citation

@misc{irdiffae_v1,
  title   = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks},
  author  = {data-archetype},
  year    = {2026},
  month   = feb,
  url     = {https://huggingface.co/data-archetype/irdiffae-v1},
}

Dependencies

  • PyTorch >= 2.0
  • safetensors (for loading weights)

License

Apache 2.0