| | --- |
| | license: apache-2.0 |
| | tags: |
| | - diffusion |
| | - autoencoder |
| | - image-reconstruction |
| | - pytorch |
| | library_name: irdiffae |
| | --- |
| | |
| | # data-archetype/irdiffae-v1 |
| |
|
| | **iRDiffAE** β **iR**epa **Diff**usion **A**uto**E**ncoder. |
| | A fast, single-GPU-trainable diffusion autoencoder with spatially structured |
| | latents for rapid downstream model convergence. Encoding runs ~5Γ faster than |
| | Flux VAE; single-step decoding runs ~3Γ faster. |
| |
|
| | ## Model Variants |
| |
|
| | | Variant | Patch | Channels | Compression | | |
| | |---------|-------|----------|-------------|---| |
| | | [irdiffae_v1](https://huggingface.co/data-archetype/irdiffae_v1) | 16x16 | 128 | 6x | recommended | |
| |
|
| | This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB. |
| |
|
| | ## Documentation |
| |
|
| | - [Technical Report](technical_report.md) β diffusion math, architecture, training, and results |
| | - [Results β interactive viewer](https://huggingface.co/spaces/data-archetype/irdiffae-results) β full-resolution side-by-side comparison |
| | - [Results β summary stats](technical_report.md#7-results) β metrics and per-image PSNR |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | import torch |
| | from ir_diffae import IRDiffAE |
| | |
| | # Load from HuggingFace Hub (or a local path) |
| | model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda") |
| | |
| | # Encode |
| | images = ... # [B, 3, H, W] in [-1, 1], H and W divisible by 16 |
| | latents = model.encode(images) |
| | |
| | # Decode (1 step by default β PSNR-optimal) |
| | recon = model.decode(latents, height=H, width=W) |
| | |
| | # Reconstruct (encode + 1-step decode) |
| | recon = model.reconstruct(images) |
| | ``` |
| |
|
| | > **Note:** Requires `pip install huggingface_hub safetensors` for Hub downloads. |
| | > You can also pass a local directory path to `from_pretrained()`. |
| |
|
| | ## Architecture |
| |
|
| | | Property | Value | |
| | |---|---| |
| | | Parameters | 120,957,440 | |
| | | File size | 461.4 MB | |
| | | Patch size | 16 | |
| | | Model dim | 896 | |
| | | Encoder depth | 4 | |
| | | Decoder depth | 8 | |
| | | Bottleneck dim | 128 | |
| | | MLP ratio | 4.0 | |
| | | Depthwise kernel | 7 | |
| | | AdaLN rank | 128 | |
| |
|
| | **Encoder**: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by |
| | DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with |
| | learned residual gates. |
| |
|
| | **Decoder**: VP diffusion conditioned on encoder latents and timestep via |
| | shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle |
| | blocks (4) -> skip fusion -> end blocks (2). Supports |
| | Path-Drop Guidance (PDG) at inference for quality/speed tradeoff. |
| |
|
| | ## Recommended Settings |
| |
|
| | Best quality is achieved with just **1 DDIM step** and PDG disabled, |
| | making inference extremely fast. PDG (strength 2-4) can optionally |
| | increase perceptual sharpness but is easy to overdo. |
| |
|
| | | Setting | Default | |
| | |---|---| |
| | | Sampler | DDIM | |
| | | Steps | 1 | |
| | | PDG | Disabled | |
| |
|
| | ```python |
| | from ir_diffae import IRDiffAEInferenceConfig |
| | |
| | # PSNR-optimal (fast, 1 step) |
| | cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim") |
| | recon = model.decode(latents, height=H, width=W, inference_config=cfg) |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{irdiffae_v1, |
| | title = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks}, |
| | author = {data-archetype}, |
| | year = {2026}, |
| | month = feb, |
| | url = {https://huggingface.co/data-archetype/irdiffae-v1}, |
| | } |
| | ``` |
| |
|
| | ## Dependencies |
| |
|
| | - PyTorch >= 2.0 |
| | - safetensors (for loading weights) |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|