File size: 2,899 Bytes
cc2e346
 
 
 
 
 
 
 
 
 
ef9c03a
be5250a
 
 
 
317dca9
be5250a
 
 
 
 
 
 
242b424
30358db
 
 
ef9c03a
68d00e7
 
 
 
242b424
68d00e7
 
2a06cb8
 
 
30358db
2a06cb8
 
 
535b55b
b9e646e
 
 
 
 
 
 
c226386
2a06cb8
 
 
535b55b
 
2a06cb8
 
 
535b55b
 
 
 
 
2a06cb8
24b99af
2a06cb8
242b424
2a06cb8
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
---
license: apache-2.0
language:
- en
base_model:
- madebyollin/sdxl-vae-fp16-fix
- stabilityai/sdxl-vae
library_name: diffusers
---
# SDXL-VAE finetuned

=== Eval ===
```
SDXL VAE fp16 fix          | MSE=2.018e-03 PSNR=29.67 LPIPS=0.124 Edge=0.188 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.108]
aiartlab/SDXLVAE           | MSE=1.736e-03 PSNR=30.29 LPIPS=0.116 Edge=0.181 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.108]
```
=== Percent ===
```
| Модель                     |       MSE |      PSNR |     LPIPS |      Edge |
|----------------------------|-----------|-----------|-----------|-----------|
| SDXL VAE fp16 fix          |      100% |      100% |      100% |      100% |
| aiartlab/SDXLVAE           |    116.3% |    102.1% |    107.3% |    103.7% |
```
[![Click it](vae.png)](https://imgsli.com/NDE1MjY1/1/2)


![zooomed](result.png)

### Diffusers
```
from diffusers import AutoencoderKL

vae = AutoencoderKL.from_pretrained("AiArtLab/sdxl_vae",subfolder="vae").cuda().half()

```

### Train status, in progress:

We are currently testing the possibility of improving the SDXL VAE decoder by increasing its depth (asymmetric VAE). This will lead to a slight increase in model size (approximately 20 percent), but we expect this will improve reconstruction quality without modifying the encoder (does not require retraining SDXL). Unfortunately, our resources are quite limited (we train models on consumer GPUs, currently training three models: SDXL VAE, Simple Diffusion, and Simple VAE), so please be patient. Model training is a meticulous and time-consuming process.

## VAE Training Process

 - Encoder: Frozen (to avoid retraining SDXL for the new VAE).
 - Dataset: 100,000 PNG images
 - Training Time: 4 days
 - Hardware: Single RTX 4090
 - Resolution: 512px
 - Precision: FP32
 - Effective Batch Size: 16 (batch size 2 + gradient accumulation 8)
 - Optimizer: AdamW (8-bit)
 - ++ MSE && Edge Loss: https://wandb.ai/recoilme/vae/runs/qy438uak

## Implementation

 - Base Code: Used a simple diffusion model training script.
 - Training Target: Only the decoder, focusing on image reconstruction.

## Loss Functions

 - Initially used LPIPS and MSE.
 - Noticed FID score improving, but images becoming blurry (FID overfits to blurry images—improving FID is not always good).
 - Switched to MAE.
 - Balanced LPIPS and MAE at 90/10 ratio.
 - Used median perceptual_loss_weight for better balance.

## Compare

https://imgsli.com/NDE1MjY1/1/2

## Donations

Please contact with us if you may provide some GPU's or money on training

DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83

BTC: 3JHv9Hb8kEW8zMAccdgCdZGfrHeMhH1rpN

## Contacts

[recoilme](https://t.me/recoilme)