Ostris VAE - KL-f8-d16
A 16 channel VAE with 8x downsample. Trained from scratch on a balance of photos, artistic, text, cartoons, vector images.
It is lighter weight that most VAEs with only 57,266,643 parameters (vs SD3 VAE: 83,819,683) which means it is faster and uses less VRAM yet scores quite similarly on real images. Plus it is MIT licensed so you can do whatever you want with it.
| VAE | PSNR (higher better) | LPIPS (lower better) | # params |
|---|---|---|---|
| sd-vae-ft-mse | 26.939 | 0.0581 | 83,653,863 |
| SDXL | 27.370 | 0.0540 | 83,653,863 |
| SD3 | 31.681 | 0.0187 | 83,819,683 |
| Ostris KL-f8-d16 | 31.166 | 0.0198 | 57,266,643 |
Compare
Check out the comparison at imgsli.
What do I do with this?
If you don't know, you probably don't need this. This is made as an open source lighter version of a 16ch vae. You would need to train it into a network before it is useful. I plan to do this myself for SD 1.5, SDXL, and possibly pixart. Follow me on Twitter to keep up with my work on that.
Note: Not SD3 compatable
This VAE is not SD3 compatable as it is trained from scratch and has an entirely different latent space.
- Downloads last month
- 7