Pacific-Prime
/

diffusion-vae

+# INL-Diffusion VAE
+Variational Autoencoder for INL-Diffusion image generation pipeline.
+## Architecture
+**89M parameters** | 256x256 images | 4-channel latent space
+### Encoder
+$$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$$
+Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).
+### Decoder
+$$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$$
+### Loss Function
+$$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$$
+Where:
+- $\mathcal{L}_{\text{recon}} = \|x - \hat{x}\|_1$ (L1 reconstruction)
+- $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
+- $\mathcal{L}_{\text{perceptual}}$ uses VGG features
+## Config
+| Parameter | Value |
+|-----------|-------|
+| Image size | 256x256 |
+| Latent dim | 4 |
+| Base channels | 128 |
+| Channel mult | [1, 2, 4, 4] |
+| Res blocks | 2 |
+## Usage
+```python
+from safetensors.torch import load_file
+from inl_diffusion.vae import INLVAE
+# Load
+state_dict = load_file("model.safetensors")
+vae = INLVAE(image_size=256, base_channels=128, latent_dim=4)
+vae.load_state_dict(state_dict)
+# Encode
+latents = vae.encode(images)  # [B, 4, 32, 32]
+# Decode
+reconstructed = vae.decode(latents)  # [B, 3, 256, 256]
+```
+## Training
+Trained on WikiArt (81K images) for 15K steps with:
+- Batch size: 16
+- Learning rate: 1e-4
+- Mixed precision: bf16
+## License
+Apache 2.0