GQModel / README.md

Improve model card: Add metadata, links, description, and usage (#1)

497fe6f verified 3 months ago

4.88 kB

	---
	license: mit
	pipeline_tag: image-to-image
	---

	# Vector Quantization using Gaussian Variational Autoencoder

	This repository contains the official implementation of Gaussian Quant (GQ), a novel method for vector quantization presented in the paper "[Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)".

	GQ proposes a simple yet effective technique that converts a Gaussian Variational Autoencoder (VAE) into a VQ-VAE without the need for additional training. It achieves this by generating random Gaussian noise as a codebook and finding the closest noise to the posterior mean. Theoretically, it's proven that a small quantization error is guaranteed when the logarithm of the codebook size exceeds the bits-back coding rate. Empirically, GQ, combined with a heuristic called target divergence constraint (TDC), outperforms previous VQ-VAEs like VQGAN, FSQ, LFQ, and BSQ on both UNet and ViT architectures.

	- \ud83d\udcda Paper on Hugging Face: [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)
	- \ud83c\udf10 Project Page: [https://tongdaxu.github.io/pages/gq.html](https://tongdaxu.github.io/pages/gq.html)
	- \ud83d\udcbb GitHub Repository: [https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE)

	## Quick Start & Usage

	This section provides a quick guide to installing the necessary dependencies, downloading pre-trained models, and inferring with them. For more details and training instructions, please refer to the [GitHub repository](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE).

	### Install dependency

	* Install dependencies in `environment.yaml`:
	```bash
	conda env create --file=environment.yaml
	conda activate tokenizer
	```

	### Install this package

	* From source:
	```bash
	pip install -e .
	```
	* [Optional] CUDA kernel for fast run time:
	```bash
	cd gq_cuda_extension
	pip install --no-build-isolation -e .
	```

	### Download pre-trained model

	* Download model "sd3unet_gq_0.25.ckpt" from [Huggingface](https://huggingface.co/xutongda/GQModel):
	```bash
	mkdir model_256
	mv "sd3unet_gq_0.25.ckpt" ./model_256
	```
	* This is a VQ-VAE with `codebook_size=2**16=65536` and `codebook_dim=16`.

	### Infer the model as VQ-VAE

	* Then use the model as follows:
	```Python
	from PIL import Image
	from torchvision import transforms
	from omegaconf import OmegaConf
	from pit.util import instantiate_from_config
	import torch

	transform = transforms.Compose([
	transforms.Resize((256,256)),
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.5, 0.5, 0.5],
	std=[0.5, 0.5, 0.5])
	])

	img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
	config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
	vae = instantiate_from_config(config.model)
	vae.load_state_dict(
	torch.load("models_256/sd3unet_gq_0.25.ckpt",
	map_location=torch.device('cpu'))["state_dict"],strict=False
	)
	vae = vae.eval().cuda()

	vae.eval()
	z, log = vae.encode(img, return_reg_log=True)
	img_hat = vae.dequant(log["indices"]) # discrete indices
	img_hat = vae.decode(z) # quantized latent
	```

	### Infer the model as Gaussian VAE

	* Alternatively, the model can be used as a Vanilla Gaussian VAE:
	```Python
	from PIL import Image
	from torchvision import transforms
	from omegaconf import OmegaConf
	from pit.util import instantiate_from_config
	import torch

	transform = transforms.Compose([
	transforms.Resize((256,256)),
	transforms.ToTensor(),
	transforms.Normalize(mean=[0.5, 0.5, 0.5],
	std=[0.5, 0.5, 0.5])
	])

	img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
	config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
	vae = instantiate_from_config(config.model)
	vae.load_state_dict(
	torch.load("models_256/sd3unet_gq_0.25.ckpt",
	map_location=torch.device('cpu'))["state_dict"],strict=False
	)
	vae = vae.eval().cuda()

	vae.eval()

	z = vae.encode(img, return_reg_log=True)[1]["zhat_noquant"] # Gaussian VAE latents
	img_hat = vae.decode(z)
	```

	## Citation

	If you find our work helpful or inspiring, please feel free to cite it:
	```bibtex
	@misc{xu2025vectorquantizationusinggaussian,
	title={Vector Quantization using Gaussian Variational Autoencoder},
	author={Tongda Xu and Wendi Zheng and Jiajun He and Jose Miguel Hernandez-Lobato and Yan Wang and Ya-Qin Zhang and Jie Tang},
	year={2025},
	eprint={2512.06609},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2512.06609},
	}
	```