File size: 4,880 Bytes
8092058
 
497fe6f
8092058
497fe6f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
license: mit
pipeline_tag: image-to-image
---

# Vector Quantization using Gaussian Variational Autoencoder

This repository contains the official implementation of **Gaussian Quant (GQ)**, a novel method for vector quantization presented in the paper "[Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)".

GQ proposes a simple yet effective technique that converts a Gaussian Variational Autoencoder (VAE) into a VQ-VAE without the need for additional training. It achieves this by generating random Gaussian noise as a codebook and finding the closest noise to the posterior mean. Theoretically, it's proven that a small quantization error is guaranteed when the logarithm of the codebook size exceeds the bits-back coding rate. Empirically, GQ, combined with a heuristic called target divergence constraint (TDC), outperforms previous VQ-VAEs like VQGAN, FSQ, LFQ, and BSQ on both UNet and ViT architectures.

-   \ud83d\udcda **Paper on Hugging Face:** [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)
-   \ud83c\udf10 **Project Page:** [https://tongdaxu.github.io/pages/gq.html](https://tongdaxu.github.io/pages/gq.html)
-   \ud83d\udcbb **GitHub Repository:** [https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE)

## Quick Start & Usage

This section provides a quick guide to installing the necessary dependencies, downloading pre-trained models, and inferring with them. For more details and training instructions, please refer to the [GitHub repository](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE).

### Install dependency

*   Install dependencies in `environment.yaml`:
    ```bash
    conda env create --file=environment.yaml
    conda activate tokenizer
    ```

### Install this package

*   From source:
    ```bash
    pip install -e .
    ```
*   [Optional] CUDA kernel for fast run time:
    ```bash
    cd gq_cuda_extension
    pip install --no-build-isolation -e .
    ```

### Download pre-trained model

*   Download model "sd3unet_gq_0.25.ckpt" from [Huggingface](https://huggingface.co/xutongda/GQModel):
    ```bash
    mkdir model_256
    mv "sd3unet_gq_0.25.ckpt" ./model_256
    ```
*   This is a VQ-VAE with `codebook_size=2**16=65536` and `codebook_dim=16`.

### Infer the model as VQ-VAE

*   Then use the model as follows:
    ```Python
    from PIL import Image
    from torchvision import transforms
    from omegaconf import OmegaConf
    from pit.util import instantiate_from_config
    import torch

    transform = transforms.Compose([
        transforms.Resize((256,256)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5],
                            std=[0.5, 0.5, 0.5])
    ])

    img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
    config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
    vae = instantiate_from_config(config.model)
    vae.load_state_dict(
        torch.load("models_256/sd3unet_gq_0.25.ckpt",
            map_location=torch.device('cpu'))["state_dict"],strict=False
        )
    vae = vae.eval().cuda()

    vae.eval()
    z, log = vae.encode(img, return_reg_log=True)
    img_hat = vae.dequant(log["indices"]) # discrete indices
    img_hat = vae.decode(z) # quantized latent
    ```

### Infer the model as Gaussian VAE

*   Alternatively, the model can be used as a Vanilla Gaussian VAE:
    ```Python
    from PIL import Image
    from torchvision import transforms
    from omegaconf import OmegaConf
    from pit.util import instantiate_from_config
    import torch

    transform = transforms.Compose([
        transforms.Resize((256,256)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.5, 0.5, 0.5],
                            std=[0.5, 0.5, 0.5])
    ])

    img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
    config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
    vae = instantiate_from_config(config.model)
    vae.load_state_dict(
        torch.load("models_256/sd3unet_gq_0.25.ckpt",
            map_location=torch.device('cpu'))["state_dict"],strict=False
        )
    vae = vae.eval().cuda()

    vae.eval()

    z = vae.encode(img, return_reg_log=True)[1]["zhat_noquant"] # Gaussian VAE latents
    img_hat = vae.decode(z)
    ```

## Citation

If you find our work helpful or inspiring, please feel free to cite it:
```bibtex
@misc{xu2025vectorquantizationusinggaussian,
      title={Vector Quantization using Gaussian Variational Autoencoder},
      author={Tongda Xu and Wendi Zheng and Jiajun He and Jose Miguel Hernandez-Lobato and Yan Wang and Ya-Qin Zhang and Jie Tang},
      year={2025},
      eprint={2512.06609},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2512.06609},
}
```