File size: 2,261 Bytes
d1eb833
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a83134
 
d1eb833
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
license: cc-by-nc-4.0
tags:
  - vae
  - image-generation
  - diffusion
  - complexity-diffusion
library_name: pytorch
pipeline_tag: image-to-image
---

# Complexity-Diffusion VAE

Variational Autoencoder for Complexity-Diffusion image generation pipeline.

## Architecture

**89M parameters** | 256x256 images | 4-channel latent space

### Encoder
$$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$$

Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).

### Decoder
$$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$$

### Loss Function
$$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$$

Where:
- $\mathcal{L}_{\text{recon}} = \|x - \hat{x}\|_1$ (L1 reconstruction)
- $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
- $\mathcal{L}_{\text{perceptual}}$ uses VGG features

## Config

| Parameter | Value |
|-----------|-------|
| Image size | 256x256 |
| Latent dim | 4 |
| Base channels | 128 |
| Channel mult | [1, 2, 4, 4] |
| Res blocks | 2 |

## Usage

```python
from safetensors.torch import load_file
from complexity_diffusion.vae import ComplexityVAE

# Load
state_dict = load_file("model.safetensors")
vae = ComplexityVAE(image_size=256, base_channels=128, latent_dim=4)
vae.load_state_dict(state_dict)

# Encode
latents = vae.encode(images)  # [B, 4, 32, 32]

# Decode
reconstructed = vae.decode(latents)  # [B, 3, 256, 256]
```

## Training

Trained on WikiArt (81K images) for 15K steps with:
- Batch size: 16
- Learning rate: 1e-4
- Mixed precision: bf16

### Training Curves

![Training Curves](training_curves.png)

## Part of Complexity Deep Ecosystem

This VAE is designed to work with the Complexity-Diffusion pipeline, leveraging:
- **INL Dynamics** for stable latent space training
- **Token-Routed architecture** for efficient processing

## Links

- [Complexity Deep](https://huggingface.co/Pacific-Prime)
- [PyPI Package](https://pypi.org/project/complexity-deep/)
- [GitHub](https://github.com/Complexity-ML/complexity-framework)
- [PyPI](https://pypi.org/project/complexity-framework/)

## License

CC BY-NC 4.0 - Attribution-NonCommercial

Commercial use requires explicit permission from the author.