Pacific-Prime commited on
Commit
32c1110
·
verified ·
1 Parent(s): c03fd2b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # INL-Diffusion VAE
2
+
3
+ Variational Autoencoder for INL-Diffusion image generation pipeline.
4
+
5
+ ## Architecture
6
+
7
+ **89M parameters** | 256x256 images | 4-channel latent space
8
+
9
+ ### Encoder
10
+ $$z = \mathcal{E}(x) \in \mathbb{R}^{32 \times 32 \times 4}$$
11
+
12
+ Compresses 256x256x3 images to 32x32x4 latents (8x spatial compression).
13
+
14
+ ### Decoder
15
+ $$\hat{x} = \mathcal{D}(z) \in \mathbb{R}^{256 \times 256 \times 3}$$
16
+
17
+ ### Loss Function
18
+ $$\mathcal{L} = \mathcal{L}_{\text{recon}} + \beta \cdot D_{KL}(q(z|x) \| p(z)) + \lambda \cdot \mathcal{L}_{\text{perceptual}}$$
19
+
20
+ Where:
21
+ - $\mathcal{L}_{\text{recon}} = \|x - \hat{x}\|_1$ (L1 reconstruction)
22
+ - $D_{KL}$ regularizes latent to $\mathcal{N}(0, I)$
23
+ - $\mathcal{L}_{\text{perceptual}}$ uses VGG features
24
+
25
+ ## Config
26
+
27
+ | Parameter | Value |
28
+ |-----------|-------|
29
+ | Image size | 256x256 |
30
+ | Latent dim | 4 |
31
+ | Base channels | 128 |
32
+ | Channel mult | [1, 2, 4, 4] |
33
+ | Res blocks | 2 |
34
+
35
+ ## Usage
36
+
37
+ ```python
38
+ from safetensors.torch import load_file
39
+ from inl_diffusion.vae import INLVAE
40
+
41
+ # Load
42
+ state_dict = load_file("model.safetensors")
43
+ vae = INLVAE(image_size=256, base_channels=128, latent_dim=4)
44
+ vae.load_state_dict(state_dict)
45
+
46
+ # Encode
47
+ latents = vae.encode(images) # [B, 4, 32, 32]
48
+
49
+ # Decode
50
+ reconstructed = vae.decode(latents) # [B, 3, 256, 256]
51
+ ```
52
+
53
+ ## Training
54
+
55
+ Trained on WikiArt (81K images) for 15K steps with:
56
+ - Batch size: 16
57
+ - Learning rate: 1e-4
58
+ - Mixed precision: bf16
59
+
60
+ ## License
61
+
62
+ Apache 2.0