Upload folder using huggingface_hub

8d69668 verified 9 days ago

3.3 kB

license: apache-2.0
tags:
  - diffusion
  - autoencoder
  - image-reconstruction
  - pytorch
library_name: irdiffae

data-archetype/irdiffae-v1

iRDiffAE — iRepa Diffusion AutoEncoder. A fast, single-GPU-trainable diffusion autoencoder with spatially structured latents for rapid downstream model convergence. Encoding runs ~5× faster than Flux VAE; single-step decoding runs ~3× faster.

Model Variants

Variant	Patch	Channels	Compression
irdiffae_v1	16x16	128	6x	recommended

This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB.

Documentation

Technical Report — diffusion math, architecture, training, and results
Results — interactive viewer — full-resolution side-by-side comparison
Results — summary stats — metrics and per-image PSNR

Quick Start

import torch
from ir_diffae import IRDiffAE

# Load from HuggingFace Hub (or a local path)
model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda")

# Encode
images = ...  # [B, 3, H, W] in [-1, 1], H and W divisible by 16
latents = model.encode(images)

# Decode (1 step by default — PSNR-optimal)
recon = model.decode(latents, height=H, width=W)

# Reconstruct (encode + 1-step decode)
recon = model.reconstruct(images)

Note: Requires pip install huggingface_hub safetensors for Hub downloads. You can also pass a local directory path to from_pretrained().

Architecture

Property	Value
Parameters	120,957,440
File size	461.4 MB
Patch size	16
Model dim	896
Encoder depth	4
Decoder depth	8
Bottleneck dim	128
MLP ratio	4.0
Depthwise kernel	7
AdaLN rank	128

Encoder: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with learned residual gates.

Decoder: VP diffusion conditioned on encoder latents and timestep via shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle blocks (4) -> skip fusion -> end blocks (2). Supports Path-Drop Guidance (PDG) at inference for quality/speed tradeoff.

Recommended Settings

Best quality is achieved with just 1 DDIM step and PDG disabled, making inference extremely fast. PDG (strength 2-4) can optionally increase perceptual sharpness but is easy to overdo.

Setting	Default
Sampler	DDIM
Steps	1
PDG	Disabled

from ir_diffae import IRDiffAEInferenceConfig

# PSNR-optimal (fast, 1 step)
cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim")
recon = model.decode(latents, height=H, width=W, inference_config=cfg)

Citation

@misc{irdiffae_v1,
  title   = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks},
  author  = {data-archetype},
  year    = {2026},
  month   = feb,
  url     = {https://huggingface.co/data-archetype/irdiffae-v1},
}

Dependencies

PyTorch >= 2.0
safetensors (for loading weights)

License

Apache 2.0