File size: 2,899 Bytes
cc2e346 ef9c03a be5250a 317dca9 be5250a 242b424 30358db ef9c03a 68d00e7 242b424 68d00e7 2a06cb8 30358db 2a06cb8 535b55b b9e646e c226386 2a06cb8 535b55b 2a06cb8 535b55b 2a06cb8 24b99af 2a06cb8 242b424 2a06cb8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
---
license: apache-2.0
language:
- en
base_model:
- madebyollin/sdxl-vae-fp16-fix
- stabilityai/sdxl-vae
library_name: diffusers
---
# SDXL-VAE finetuned
=== Eval ===
```
SDXL VAE fp16 fix | MSE=2.018e-03 PSNR=29.67 LPIPS=0.124 Edge=0.188 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.108]
aiartlab/SDXLVAE | MSE=1.736e-03 PSNR=30.29 LPIPS=0.116 Edge=0.181 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.108]
```
=== Percent ===
```
| Модель | MSE | PSNR | LPIPS | Edge |
|----------------------------|-----------|-----------|-----------|-----------|
| SDXL VAE fp16 fix | 100% | 100% | 100% | 100% |
| aiartlab/SDXLVAE | 116.3% | 102.1% | 107.3% | 103.7% |
```
[](https://imgsli.com/NDE1MjY1/1/2)

### Diffusers
```
from diffusers import AutoencoderKL
vae = AutoencoderKL.from_pretrained("AiArtLab/sdxl_vae",subfolder="vae").cuda().half()
```
### Train status, in progress:
We are currently testing the possibility of improving the SDXL VAE decoder by increasing its depth (asymmetric VAE). This will lead to a slight increase in model size (approximately 20 percent), but we expect this will improve reconstruction quality without modifying the encoder (does not require retraining SDXL). Unfortunately, our resources are quite limited (we train models on consumer GPUs, currently training three models: SDXL VAE, Simple Diffusion, and Simple VAE), so please be patient. Model training is a meticulous and time-consuming process.
## VAE Training Process
- Encoder: Frozen (to avoid retraining SDXL for the new VAE).
- Dataset: 100,000 PNG images
- Training Time: 4 days
- Hardware: Single RTX 4090
- Resolution: 512px
- Precision: FP32
- Effective Batch Size: 16 (batch size 2 + gradient accumulation 8)
- Optimizer: AdamW (8-bit)
- ++ MSE && Edge Loss: https://wandb.ai/recoilme/vae/runs/qy438uak
## Implementation
- Base Code: Used a simple diffusion model training script.
- Training Target: Only the decoder, focusing on image reconstruction.
## Loss Functions
- Initially used LPIPS and MSE.
- Noticed FID score improving, but images becoming blurry (FID overfits to blurry images—improving FID is not always good).
- Switched to MAE.
- Balanced LPIPS and MAE at 90/10 ratio.
- Used median perceptual_loss_weight for better balance.
## Compare
https://imgsli.com/NDE1MjY1/1/2
## Donations
Please contact with us if you may provide some GPU's or money on training
DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83
BTC: 3JHv9Hb8kEW8zMAccdgCdZGfrHeMhH1rpN
## Contacts
[recoilme](https://t.me/recoilme)
|