|
|
--- |
|
|
license: cc-by-nc-4.0 |
|
|
tags: |
|
|
- vae |
|
|
- image-generation |
|
|
- diffusion |
|
|
- complexity-diffusion |
|
|
library_name: pytorch |
|
|
pipeline_tag: image-to-image |
|
|
--- |
|
|
|
|
|
# Complexity-Diffusion VAE |
|
|
|
|
|
Variational Autoencoder for Complexity-Diffusion image generation pipeline. |
|
|
|
|
|
## Architecture |
|
|
|
|
|
**89M parameters** | 256x256 images | 4-channel latent space |
|
|
|
|
|
### Encoder |
|
|
$$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$$ |
|
|
|
|
|
Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression). |
|
|
|
|
|
### Decoder |
|
|
$$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$$ |
|
|
|
|
|
### Loss Function |
|
|
$$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$$ |
|
|
|
|
|
Where: |
|
|
- $\mathcal{L}_{\text{recon}} = \|x - \hat{x}\|_1$ (L1 reconstruction) |
|
|
- $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$ |
|
|
- $\mathcal{L}_{\text{perceptual}}$ uses VGG features |
|
|
|
|
|
## Config |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Image size | 256x256 | |
|
|
| Latent dim | 4 | |
|
|
| Base channels | 128 | |
|
|
| Channel mult | [1, 2, 4, 4] | |
|
|
| Res blocks | 2 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from safetensors.torch import load_file |
|
|
from complexity_diffusion.vae import ComplexityVAE |
|
|
|
|
|
# Load |
|
|
state_dict = load_file("model.safetensors") |
|
|
vae = ComplexityVAE(image_size=256, base_channels=128, latent_dim=4) |
|
|
vae.load_state_dict(state_dict) |
|
|
|
|
|
# Encode |
|
|
latents = vae.encode(images) # [B, 4, 32, 32] |
|
|
|
|
|
# Decode |
|
|
reconstructed = vae.decode(latents) # [B, 3, 256, 256] |
|
|
``` |
|
|
|
|
|
## Training |
|
|
|
|
|
Trained on WikiArt (81K images) for 15K steps with: |
|
|
- Batch size: 16 |
|
|
- Learning rate: 1e-4 |
|
|
- Mixed precision: bf16 |
|
|
|
|
|
### Training Curves |
|
|
|
|
|
 |
|
|
|
|
|
## Part of Complexity Deep Ecosystem |
|
|
|
|
|
This VAE is designed to work with the Complexity-Diffusion pipeline, leveraging: |
|
|
- **INL Dynamics** for stable latent space training |
|
|
- **Token-Routed architecture** for efficient processing |
|
|
|
|
|
## Links |
|
|
|
|
|
- [Complexity Deep](https://huggingface.co/Pacific-Prime) |
|
|
- [PyPI Package](https://pypi.org/project/complexity-deep/) |
|
|
- [GitHub](https://github.com/Complexity-ML/complexity-framework) |
|
|
- [PyPI](https://pypi.org/project/complexity-framework/) |
|
|
|
|
|
## License |
|
|
|
|
|
CC BY-NC 4.0 - Attribution-NonCommercial |
|
|
|
|
|
Commercial use requires explicit permission from the author. |
|
|
|