irdiffae-v1 / README.md
data-archetype's picture
Upload folder using huggingface_hub
8d69668 verified
---
license: apache-2.0
tags:
- diffusion
- autoencoder
- image-reconstruction
- pytorch
library_name: irdiffae
---
# data-archetype/irdiffae-v1
**iRDiffAE** β€” **iR**epa **Diff**usion **A**uto**E**ncoder.
A fast, single-GPU-trainable diffusion autoencoder with spatially structured
latents for rapid downstream model convergence. Encoding runs ~5Γ— faster than
Flux VAE; single-step decoding runs ~3Γ— faster.
## Model Variants
| Variant | Patch | Channels | Compression | |
|---------|-------|----------|-------------|---|
| [irdiffae_v1](https://huggingface.co/data-archetype/irdiffae_v1) | 16x16 | 128 | 6x | recommended |
This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB.
## Documentation
- [Technical Report](technical_report.md) β€” diffusion math, architecture, training, and results
- [Results β€” interactive viewer](https://huggingface.co/spaces/data-archetype/irdiffae-results) β€” full-resolution side-by-side comparison
- [Results β€” summary stats](technical_report.md#7-results) β€” metrics and per-image PSNR
## Quick Start
```python
import torch
from ir_diffae import IRDiffAE
# Load from HuggingFace Hub (or a local path)
model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda")
# Encode
images = ... # [B, 3, H, W] in [-1, 1], H and W divisible by 16
latents = model.encode(images)
# Decode (1 step by default β€” PSNR-optimal)
recon = model.decode(latents, height=H, width=W)
# Reconstruct (encode + 1-step decode)
recon = model.reconstruct(images)
```
> **Note:** Requires `pip install huggingface_hub safetensors` for Hub downloads.
> You can also pass a local directory path to `from_pretrained()`.
## Architecture
| Property | Value |
|---|---|
| Parameters | 120,957,440 |
| File size | 461.4 MB |
| Patch size | 16 |
| Model dim | 896 |
| Encoder depth | 4 |
| Decoder depth | 8 |
| Bottleneck dim | 128 |
| MLP ratio | 4.0 |
| Depthwise kernel | 7 |
| AdaLN rank | 128 |
**Encoder**: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by
DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with
learned residual gates.
**Decoder**: VP diffusion conditioned on encoder latents and timestep via
shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle
blocks (4) -> skip fusion -> end blocks (2). Supports
Path-Drop Guidance (PDG) at inference for quality/speed tradeoff.
## Recommended Settings
Best quality is achieved with just **1 DDIM step** and PDG disabled,
making inference extremely fast. PDG (strength 2-4) can optionally
increase perceptual sharpness but is easy to overdo.
| Setting | Default |
|---|---|
| Sampler | DDIM |
| Steps | 1 |
| PDG | Disabled |
```python
from ir_diffae import IRDiffAEInferenceConfig
# PSNR-optimal (fast, 1 step)
cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim")
recon = model.decode(latents, height=H, width=W, inference_config=cfg)
```
## Citation
```bibtex
@misc{irdiffae_v1,
title = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks},
author = {data-archetype},
year = {2026},
month = feb,
url = {https://huggingface.co/data-archetype/irdiffae-v1},
}
```
## Dependencies
- PyTorch >= 2.0
- safetensors (for loading weights)
## License
Apache 2.0