| | --- |
| | license: mit |
| | pipeline_tag: image-to-image |
| | --- |
| | |
| | # Vector Quantization using Gaussian Variational Autoencoder |
| |
|
| | This repository contains the official implementation of **Gaussian Quant (GQ)**, a novel method for vector quantization presented in the paper "[Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)". |
| |
|
| | GQ proposes a simple yet effective technique that converts a Gaussian Variational Autoencoder (VAE) into a VQ-VAE without the need for additional training. It achieves this by generating random Gaussian noise as a codebook and finding the closest noise to the posterior mean. Theoretically, it's proven that a small quantization error is guaranteed when the logarithm of the codebook size exceeds the bits-back coding rate. Empirically, GQ, combined with a heuristic called target divergence constraint (TDC), outperforms previous VQ-VAEs like VQGAN, FSQ, LFQ, and BSQ on both UNet and ViT architectures. |
| |
|
| | - \ud83d\udcda **Paper on Hugging Face:** [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609) |
| | - \ud83c\udf10 **Project Page:** [https://tongdaxu.github.io/pages/gq.html](https://tongdaxu.github.io/pages/gq.html) |
| | - \ud83d\udcbb **GitHub Repository:** [https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE) |
| |
|
| | ## Quick Start & Usage |
| |
|
| | This section provides a quick guide to installing the necessary dependencies, downloading pre-trained models, and inferring with them. For more details and training instructions, please refer to the [GitHub repository](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE). |
| |
|
| | ### Install dependency |
| |
|
| | * Install dependencies in `environment.yaml`: |
| | ```bash |
| | conda env create --file=environment.yaml |
| | conda activate tokenizer |
| | ``` |
| | |
| | ### Install this package |
| |
|
| | * From source: |
| | ```bash |
| | pip install -e . |
| | ``` |
| | * [Optional] CUDA kernel for fast run time: |
| | ```bash |
| | cd gq_cuda_extension |
| | pip install --no-build-isolation -e . |
| | ``` |
| | |
| | ### Download pre-trained model |
| |
|
| | * Download model "sd3unet_gq_0.25.ckpt" from [Huggingface](https://huggingface.co/xutongda/GQModel): |
| | ```bash |
| | mkdir model_256 |
| | mv "sd3unet_gq_0.25.ckpt" ./model_256 |
| | ``` |
| | * This is a VQ-VAE with `codebook_size=2**16=65536` and `codebook_dim=16`. |
| | |
| | ### Infer the model as VQ-VAE |
| |
|
| | * Then use the model as follows: |
| | ```Python |
| | from PIL import Image |
| | from torchvision import transforms |
| | from omegaconf import OmegaConf |
| | from pit.util import instantiate_from_config |
| | import torch |
| | |
| | transform = transforms.Compose([ |
| | transforms.Resize((256,256)), |
| | transforms.ToTensor(), |
| | transforms.Normalize(mean=[0.5, 0.5, 0.5], |
| | std=[0.5, 0.5, 0.5]) |
| | ]) |
| | |
| | img = transform(Image.open("demo.png")).unsqueeze(0).cuda() |
| | config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml") |
| | vae = instantiate_from_config(config.model) |
| | vae.load_state_dict( |
| | torch.load("models_256/sd3unet_gq_0.25.ckpt", |
| | map_location=torch.device('cpu'))["state_dict"],strict=False |
| | ) |
| | vae = vae.eval().cuda() |
| | |
| | vae.eval() |
| | z, log = vae.encode(img, return_reg_log=True) |
| | img_hat = vae.dequant(log["indices"]) # discrete indices |
| | img_hat = vae.decode(z) # quantized latent |
| | ``` |
| | |
| | ### Infer the model as Gaussian VAE |
| |
|
| | * Alternatively, the model can be used as a Vanilla Gaussian VAE: |
| | ```Python |
| | from PIL import Image |
| | from torchvision import transforms |
| | from omegaconf import OmegaConf |
| | from pit.util import instantiate_from_config |
| | import torch |
| | |
| | transform = transforms.Compose([ |
| | transforms.Resize((256,256)), |
| | transforms.ToTensor(), |
| | transforms.Normalize(mean=[0.5, 0.5, 0.5], |
| | std=[0.5, 0.5, 0.5]) |
| | ]) |
| | |
| | img = transform(Image.open("demo.png")).unsqueeze(0).cuda() |
| | config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml") |
| | vae = instantiate_from_config(config.model) |
| | vae.load_state_dict( |
| | torch.load("models_256/sd3unet_gq_0.25.ckpt", |
| | map_location=torch.device('cpu'))["state_dict"],strict=False |
| | ) |
| | vae = vae.eval().cuda() |
| | |
| | vae.eval() |
| | |
| | z = vae.encode(img, return_reg_log=True)[1]["zhat_noquant"] # Gaussian VAE latents |
| | img_hat = vae.decode(z) |
| | ``` |
| | |
| | ## Citation |
| |
|
| | If you find our work helpful or inspiring, please feel free to cite it: |
| | ```bibtex |
| | @misc{xu2025vectorquantizationusinggaussian, |
| | title={Vector Quantization using Gaussian Variational Autoencoder}, |
| | author={Tongda Xu and Wendi Zheng and Jiajun He and Jose Miguel Hernandez-Lobato and Yan Wang and Ya-Qin Zhang and Jie Tang}, |
| | year={2025}, |
| | eprint={2512.06609}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG}, |
| | url={https://arxiv.org/abs/2512.06609}, |
| | } |
| | ``` |