Image-to-Image
Diffusers
Safetensors
Sana
English
vae
autoencoder
image
stable-diffusion
sdxl
flux
qwen
Instructions to use BiliSakura/VAEs with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/VAEs with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/VAEs", dtype=torch.bfloat16, device_map="cuda") prompt = "Turn this cat into a dog" input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe(image=input_image, prompt=prompt).images[0] - Sana
How to use BiliSakura/VAEs with Sana:
# Load the model and infer image from text import torch from app.sana_pipeline import SanaPipeline from torchvision.utils import save_image sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml") sana.from_pretrained("hf://BiliSakura/VAEs") image = sana( prompt='a cyberpunk cat with a neon sign that says "Sana"', height=1024, width=1024, guidance_scale=5.0, pag_guidance_scale=2.0, num_inference_steps=18, ) - Notebooks
- Google Colab
- Kaggle
VAEs for Image Generation
This repository hosts a curated collection of VAE checkpoints used by diffusion and transformer-based image generation pipelines.
Available VAEs
AutoencoderKL
| Model | Source | Latent Channels |
|---|---|---|
| SD21-VAE | Stable Diffusion 2.1 | 4 |
| SDXL-VAE | Stable Diffusion XL | 4 |
| SD35-VAE | Stable Diffusion 3.5 | 16 |
| FLUX1-VAE | FLUX.1 | 16 |
| FLUX2-VAE | FLUX.2 | 32 |
| SANA-VAE | SANA (DC-AE) | 32 |
| Qwen-VAE | Qwen-Image | 16 |
VQModel
| Model | Source | latent_channels | num_vq_embeddings | vq_embed_dim | sample_size |
|---|---|---|---|---|---|
| VQDIFFUSION-VQVAE | VQ-Diffusion (microsoft/vq-diffusion-ithq) | 256 | 4096 | 128 | 32 |
| IBQ-VQVAE-1024 | IBQ (TencentARC/SEED) | 256 | 1024 | 256 | 32 |
| IBQ-VQVAE-8192 | IBQ (TencentARC/SEED) | 256 | 8192 | 256 | 32 |
| IBQ-VQVAE-16384 | IBQ (TencentARC/SEED) | 256 | 16384 | 256 | 32 |
| IBQ-VQVAE-262144 | IBQ (TencentARC/SEED) | 256 | 262144 | 256 | 32 |
| MOVQGAN-67M | MOVQGAN | 4 | 16384 | 4 | 256 |
| MOVQGAN-102M | MOVQGAN | 4 | 16384 | 4 | 256 |
| MOVQGAN-270M | MOVQGAN | 4 | 16384 | 4 | 256 |
Diffusers usage
AutoencoderKL (SD, FLUX, SANA, Qwen, etc.):
from diffusers import AutoencoderKL
vae = AutoencoderKL.from_pretrained(
"BiliSakura/VAEs",
subfolder="SDXL-VAE",
)
VQModel (VQ-Diffusion, IBQ, MOVQGAN):
from diffusers import VQModel
vae = VQModel.from_pretrained(
"BiliSakura/VAEs",
subfolder="VQDIFFUSION-VQVAE",
)
Notes
- All models are VAE checkpoints intended for inference use in their corresponding pipelines.
- Latent channel count is listed to help match with the correct backbone.
- Downloads last month
- -