VAEs for Image Generation

This repository hosts a curated collection of VAE checkpoints used by diffusion and transformer-based image generation pipelines.

Available VAEs

Model	Source	latent_channels	num_vq_embeddings	vq_embed_dim	sample_size
VQDIFFUSION-VQVAE	VQ-Diffusion (microsoft/vq-diffusion-ithq)	256	4096	128	32
IBQ-VQVAE-1024	IBQ (TencentARC/SEED)	256	1024	256	32
IBQ-VQVAE-8192	IBQ (TencentARC/SEED)	256	8192	256	32
IBQ-VQVAE-16384	IBQ (TencentARC/SEED)	256	16384	256	32
IBQ-VQVAE-262144	IBQ (TencentARC/SEED)	256	262144	256	32
MOVQGAN-67M	MOVQGAN	4	16384	4	256
MOVQGAN-102M	MOVQGAN	4	16384	4	256
MOVQGAN-270M	MOVQGAN	4	16384	4	256

AutoencoderKL (SD, FLUX, SANA, Qwen, etc.):

from diffusers import AutoencoderKL

vae = AutoencoderKL.from_pretrained(
    "BiliSakura/VAEs",
    subfolder="SDXL-VAE",
)

VQModel (VQ-Diffusion, IBQ, MOVQGAN):

from diffusers import VQModel

vae = VQModel.from_pretrained(
    "BiliSakura/VAEs",
    subfolder="VQDIFFUSION-VQVAE",
)

All models are VAE checkpoints intended for inference use in their corresponding pipelines.
Latent channel count is listed to help match with the correct backbone.