|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
## Comparison |
|
|
 |
|
|
|
|
|
|
|
|
## === Metrics === |
|
|
``` |
|
|
SD15 VAE | MSE=2.732e-03 PSNR=28.10 LPIPS=0.147 Edge=0.206 KL=19.821 | Z[min/mean/max/std]=[-17.375, 0.072, 16.203, 0.900] | Skew[min/mean/max]=[-0.543, -0.126, 0.070] | Kurt[min/mean/max]=[-0.151, 1.228, 4.574] |
|
|
SDXL VAE fp16 fix | MSE=2.018e-03 PSNR=29.67 LPIPS=0.124 Edge=0.188 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.107] |
|
|
AiArtLab/sdxl_vae | MSE=1.736e-03 PSNR=30.29 LPIPS=0.116 Edge=0.181 KL=32.222 | Z[min/mean/max/std]=[-4.066, -0.014, 4.301, 0.861] | Skew[min/mean/max]=[-0.017, 0.105, 0.165] | Kurt[min/mean/max]=[-0.380, -0.228, -0.107] |
|
|
LTX-Video VAE | MSE=1.202e-03 PSNR=31.84 LPIPS=0.141 Edge=0.168 KL=6.656 | Z[min/mean/max/std]=[-5.043, 0.011, 4.969, 0.272] | Skew[min/mean/max]=[-0.542, -0.018, 0.411] | Kurt[min/mean/max]=[-0.576, 0.741, 1.843] |
|
|
Wan2.2-TI2V-5B | MSE=7.782e-04 PSNR=34.25 LPIPS=0.052 Edge=0.121 KL=9.472 | Z[min/mean/max/std]=[-4.789, -0.012, 4.266, 0.375] | Skew[min/mean/max]=[-0.397, 0.022, 0.653] | Kurt[min/mean/max]=[-0.482, 0.006, 0.538] |
|
|
AiArtLab/wan16x_vae | MSE=7.275e-04 PSNR=34.51 LPIPS=0.051 Edge=0.118 KL=9.472 | Z[min/mean/max/std]=[-4.789, -0.012, 4.266, 0.375] | Skew[min/mean/max]=[-0.397, 0.022, 0.653] | Kurt[min/mean/max]=[-0.482, 0.006, 0.538] |
|
|
Wan2.2-T2V-A14B | MSE=7.073e-04 PSNR=34.59 LPIPS=0.048 Edge=0.115 KL=7.781 | Z[min/mean/max/std]=[-15.336, -0.159, 17.703, 2.563] | Skew[min/mean/max]=[-0.343, 0.006, 0.367] | Kurt[min/mean/max]=[-0.538, -0.071, 0.594] |
|
|
QwenImage | MSE=6.549e-04 PSNR=35.21 LPIPS=0.047 Edge=0.110 KL=7.776 | Z[min/mean/max/std]=[-15.297, -0.158, 17.688, 2.561] | Skew[min/mean/max]=[-0.346, 0.005, 0.368] | Kurt[min/mean/max]=[-0.538, -0.072, 0.597] |
|
|
AuraDiffusion/16ch-vae | MSE=5.361e-04 PSNR=35.80 LPIPS=0.041 Edge=0.100 KL=4.421 | Z[min/mean/max/std]=[-1.373, -0.005, 1.621, 0.165] | Skew[min/mean/max]=[-0.331, 0.040, 0.413] | Kurt[min/mean/max]=[-0.170, 0.303, 0.670] |
|
|
FLUX.1-schnell VAE | MSE=4.594e-04 PSNR=35.87 LPIPS=0.035 Edge=0.088 KL=13.016 | Z[min/mean/max/std]=[-5.824, -0.076, 6.246, 0.945] | Skew[min/mean/max]=[-0.268, 0.048, 0.483] | Kurt[min/mean/max]=[-0.498, 0.037, 0.568] |
|
|
AiArtLab/simplevae | MSE=4.818e-04 PSNR=36.20 LPIPS=0.035 Edge=0.095 KL=4.032 | Z[min/mean/max/std]=[-7.762, -0.061, 9.914, 0.965] | Skew[min/mean/max]=[-0.320, 0.044, 0.411] | Kurt[min/mean/max]=[-0.045, 0.346, 0.696] |
|
|
``` |
|
|
## === Percent === |
|
|
``` |
|
|
| Model | PSNR | LPIPS | Edge | |
|
|
|----------------------------|-----------|-----------|-----------| |
|
|
| SD15 VAE | 100% | 100% | 100% | |
|
|
| SDXL VAE fp16 fix | 105.6% | 118.3% | 109.7% | |
|
|
| AiArtLab/sdxl_vae | 107.8% | 126.8% | 113.8% | |
|
|
| LTX-Video VAE | 113.3% | 103.8% | 122.5% | |
|
|
| Wan2.2-TI2V-5B | 121.9% | 280.8% | 170.8% | |
|
|
| AiArtLab/wan16x_vae | 122.8% | 287.3% | 174.2% | |
|
|
| Wan2.2-T2V-A14B | 123.1% | 303.2% | 179.4% | |
|
|
| QwenImage | 125.3% | 308.8% | 188.0% | |
|
|
| AuraDiffusion/16ch-vae | 127.4% | 355.5% | 206.6% | |
|
|
| FLUX.1-schnell VAE | 127.6% | 424.4% | 234.8% | |
|
|
| AiArtLab/simplevae | 128.8% | 415.2% | 217.7% | |
|
|
``` |
|
|
|
|
|
## Compare |
|
|
|
|
|
https://imgsli.com/NDE1MzE0/5/2 |
|
|
|
|
|
### Diffusers |
|
|
``` |
|
|
from diffusers import AutoencoderKL |
|
|
vae = AutoencoderKL.from_pretrained("AiArtLab/simplevae",subfolder="vae").cuda().half() |
|
|
|
|
|
``` |
|
|
|
|
|
## VAE Training Process |
|
|
|
|
|
- Inited from AuraDiffusion/16ch-vae (not compatible), added mid block/retrained |
|
|
- Dataset: 100,000 PNG images |
|
|
- Training Time: ~ 2 weeks |
|
|
- Hardware: Single RTX 5090 |
|
|
- Resolution: 512px |
|
|
- Precision: FP32 |
|
|
- Effective Batch Size: 16 |
|
|
- Optimizer: AdamW (8-bit) |
|
|
- Balanced losses (lpips, MSE, MAE, Edge, KL) |
|
|
## Source |
|
|
|
|
|
https://huggingface.co/AiArtLab/simplevae/blob/main/train_vae.py |
|
|
|
|
|
## Acknowledgments |
|
|
- **[Stan](https://t.me/Stangle)** — Key investor. Thank you for believing in us when others called it madness. |
|
|
- **Captainsaturnus** |
|
|
- **Love. Death. Transformers.** |
|
|
- **TOPAPEC** |
|
|
|
|
|
## Donations |
|
|
|
|
|
Please contact with us if you may provide some GPU's or money on training |
|
|
|
|
|
DOGE: DEw2DR8C7BnF8GgcrfTzUjSnGkuMeJhg83 |
|
|
|
|
|
BTC: 3JHv9Hb8kEW8zMAccdgCdZGfrHeMhH1rpN |
|
|
|
|
|
## Contacts |
|
|
|
|
|
[recoilme](https://t.me/recoilme) |
|
|
|
|
|
## Test training |
|
|
|
|
|
[test train](trainvideo.mp4) |
|
|
|