data-archetype
/

irdiffae-v1

image-reconstruction

Model card Files Files and versions

irdiffae-v1 / README.md

data-archetype's picture

Upload folder using huggingface_hub

8d69668 verified 9 days ago

|

history blame contribute delete

3.3 kB

	---
	license: apache-2.0
	tags:
	- diffusion
	- autoencoder
	- image-reconstruction
	- pytorch
	library_name: irdiffae
	---

	# data-archetype/irdiffae-v1

	iRDiffAE — iRepa Diffusion AutoEncoder.
	A fast, single-GPU-trainable diffusion autoencoder with spatially structured
	latents for rapid downstream model convergence. Encoding runs ~5× faster than
	Flux VAE; single-step decoding runs ~3× faster.

	## Model Variants

	\| Variant \| Patch \| Channels \| Compression \| \|
	\|---------\|-------\|----------\|-------------\|---\|
	\| [irdiffae_v1](https://huggingface.co/data-archetype/irdiffae_v1) \| 16x16 \| 128 \| 6x \| recommended \|

	This variant (data-archetype/irdiffae-v1): 121.0M parameters, 461.4 MB.

	## Documentation

	- [Technical Report](technical_report.md) — diffusion math, architecture, training, and results
	- [Results — interactive viewer](https://huggingface.co/spaces/data-archetype/irdiffae-results) — full-resolution side-by-side comparison
	- [Results — summary stats](technical_report.md#7-results) — metrics and per-image PSNR

	## Quick Start

	```python
	import torch
	from ir_diffae import IRDiffAE

	# Load from HuggingFace Hub (or a local path)
	model = IRDiffAE.from_pretrained("data-archetype/irdiffae-v1", device="cuda")

	# Encode
	images = ... # [B, 3, H, W] in [-1, 1], H and W divisible by 16
	latents = model.encode(images)

	# Decode (1 step by default — PSNR-optimal)
	recon = model.decode(latents, height=H, width=W)

	# Reconstruct (encode + 1-step decode)
	recon = model.reconstruct(images)
	```

	> Note: Requires `pip install huggingface_hub safetensors` for Hub downloads.
	> You can also pass a local directory path to `from_pretrained()`.

	## Architecture

	\| Property \| Value \|
	\|---\|---\|
	\| Parameters \| 120,957,440 \|
	\| File size \| 461.4 MB \|
	\| Patch size \| 16 \|
	\| Model dim \| 896 \|
	\| Encoder depth \| 4 \|
	\| Decoder depth \| 8 \|
	\| Bottleneck dim \| 128 \|
	\| MLP ratio \| 4.0 \|
	\| Depthwise kernel \| 7 \|
	\| AdaLN rank \| 128 \|

	Encoder: Deterministic. Patchify (PixelUnshuffle + 1x1 conv) followed by
	DiCo blocks (depthwise conv + compact channel attention + GELU MLP) with
	learned residual gates.

	Decoder: VP diffusion conditioned on encoder latents and timestep via
	shared-base + per-layer low-rank AdaLN-Zero. Start blocks (2) -> middle
	blocks (4) -> skip fusion -> end blocks (2). Supports
	Path-Drop Guidance (PDG) at inference for quality/speed tradeoff.

	## Recommended Settings

	Best quality is achieved with just 1 DDIM step and PDG disabled,
	making inference extremely fast. PDG (strength 2-4) can optionally
	increase perceptual sharpness but is easy to overdo.

	\| Setting \| Default \|
	\|---\|---\|
	\| Sampler \| DDIM \|
	\| Steps \| 1 \|
	\| PDG \| Disabled \|

	```python
	from ir_diffae import IRDiffAEInferenceConfig

	# PSNR-optimal (fast, 1 step)
	cfg = IRDiffAEInferenceConfig(num_steps=1, sampler="ddim")
	recon = model.decode(latents, height=H, width=W, inference_config=cfg)
	```

	## Citation

	```bibtex
	@misc{irdiffae_v1,
	title = {iRDiffAE: A Fast, Representation Aligned Diffusion Autoencoder with DiCo Blocks},
	author = {data-archetype},
	year = {2026},
	month = feb,
	url = {https://huggingface.co/data-archetype/irdiffae-v1},
	}
	```

	## Dependencies

	- PyTorch >= 2.0
	- safetensors (for loading weights)

	## License

	Apache 2.0