Pacific-Prime
/

diffusion-vae

image-generation

complexity-diffusion

Model card Files Files and versions

diffusion-vae / README.md

Pacific-Prime's picture

Update README.md

3a83134 verified 8 days ago

|

history blame contribute delete

2.26 kB

	---
	license: cc-by-nc-4.0
	tags:
	- vae
	- image-generation
	- diffusion
	- complexity-diffusion
	library_name: pytorch
	pipeline_tag: image-to-image
	---

	# Complexity-Diffusion VAE

	Variational Autoencoder for Complexity-Diffusion image generation pipeline.

	## Architecture

	89M parameters \| 256x256 images \| 4-channel latent space

	### Encoder
	$$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$$

	Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).

	### Decoder
	$$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$$

	### Loss Function
	$$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z\|x) \\| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$$

	Where:
	- $\mathcal{L}_{\text{recon}} = \\|x - \hat{x}\\|_1$ (L1 reconstruction)
	- $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
	- $\mathcal{L}_{\text{perceptual}}$ uses VGG features

	## Config

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Image size \| 256x256 \|
	\| Latent dim \| 4 \|
	\| Base channels \| 128 \|
	\| Channel mult \| [1, 2, 4, 4] \|
	\| Res blocks \| 2 \|

	## Usage

	```python
	from safetensors.torch import load_file
	from complexity_diffusion.vae import ComplexityVAE

	# Load
	state_dict = load_file("model.safetensors")
	vae = ComplexityVAE(image_size=256, base_channels=128, latent_dim=4)
	vae.load_state_dict(state_dict)

	# Encode
	latents = vae.encode(images) # [B, 4, 32, 32]

	# Decode
	reconstructed = vae.decode(latents) # [B, 3, 256, 256]
	```

	## Training

	Trained on WikiArt (81K images) for 15K steps with:
	- Batch size: 16
	- Learning rate: 1e-4
	- Mixed precision: bf16

	### Training Curves

	![Training Curves](training_curves.png)

	## Part of Complexity Deep Ecosystem

	This VAE is designed to work with the Complexity-Diffusion pipeline, leveraging:
	- INL Dynamics for stable latent space training
	- Token-Routed architecture for efficient processing

	## Links

	- [Complexity Deep](https://huggingface.co/Pacific-Prime)
	- [PyPI Package](https://pypi.org/project/complexity-deep/)
	- [GitHub](https://github.com/Complexity-ML/complexity-framework)
	- [PyPI](https://pypi.org/project/complexity-framework/)

	## License

	CC BY-NC 4.0 - Attribution-NonCommercial

	Commercial use requires explicit permission from the author.