nielsr HF Staff commited on
Commit
bf08bc6
·
verified ·
1 Parent(s): 587b62d

Improve model card: Add metadata, links, description, and usage

Browse files

This PR significantly improves the model card for [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609) by:

- Adding the `pipeline_tag: image-to-image` to the metadata for better discoverability and potential inference widget activation.
- Updating the paper link to the official Hugging Face Papers page.
- Including a link to the dedicated project page.
- Adding a concise model description based on the paper's abstract.
- Providing detailed sample usage code snippets (for both VQ-VAE and Gaussian VAE inference) directly from the GitHub repository, making it easier for users to get started.
- Adding the BibTeX citation.

Please review and merge if everything looks good.

Files changed (1) hide show
  1. README.md +123 -2
README.md CHANGED
@@ -1,5 +1,126 @@
1
  ---
2
  license: mit
 
3
  ---
4
- See paper in: [Arxiv](arxiv.org/abs/2512.06609)
5
- See usage in: [Github](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ pipeline_tag: image-to-image
4
  ---
5
+
6
+ # Vector Quantization using Gaussian Variational Autoencoder
7
+
8
+ This repository contains the official implementation of **Gaussian Quant (GQ)**, a novel method for vector quantization presented in the paper "[Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)".
9
+
10
+ GQ proposes a simple yet effective technique that converts a Gaussian Variational Autoencoder (VAE) into a VQ-VAE without the need for additional training. It achieves this by generating random Gaussian noise as a codebook and finding the closest noise to the posterior mean. Theoretically, it's proven that a small quantization error is guaranteed when the logarithm of the codebook size exceeds the bits-back coding rate. Empirically, GQ, combined with a heuristic called target divergence constraint (TDC), outperforms previous VQ-VAEs like VQGAN, FSQ, LFQ, and BSQ on both UNet and ViT architectures.
11
+
12
+ - \ud83d\udcda **Paper on Hugging Face:** [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)
13
+ - \ud83c\udf10 **Project Page:** [https://tongdaxu.github.io/pages/gq.html](https://tongdaxu.github.io/pages/gq.html)
14
+ - \ud83d\udcbb **GitHub Repository:** [https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE)
15
+
16
+ ## Quick Start & Usage
17
+
18
+ This section provides a quick guide to installing the necessary dependencies, downloading pre-trained models, and inferring with them. For more details and training instructions, please refer to the [GitHub repository](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE).
19
+
20
+ ### Install dependency
21
+
22
+ * Install dependencies in `environment.yaml`:
23
+ ```bash
24
+ conda env create --file=environment.yaml
25
+ conda activate tokenizer
26
+ ```
27
+
28
+ ### Install this package
29
+
30
+ * From source:
31
+ ```bash
32
+ pip install -e .
33
+ ```
34
+ * [Optional] CUDA kernel for fast run time:
35
+ ```bash
36
+ cd gq_cuda_extension
37
+ pip install --no-build-isolation -e .
38
+ ```
39
+
40
+ ### Download pre-trained model
41
+
42
+ * Download model "sd3unet_gq_0.25.ckpt" from [Huggingface](https://huggingface.co/xutongda/GQModel):
43
+ ```bash
44
+ mkdir model_256
45
+ mv "sd3unet_gq_0.25.ckpt" ./model_256
46
+ ```
47
+ * This is a VQ-VAE with `codebook_size=2**16=65536` and `codebook_dim=16`.
48
+
49
+ ### Infer the model as VQ-VAE
50
+
51
+ * Then use the model as follows:
52
+ ```Python
53
+ from PIL import Image
54
+ from torchvision import transforms
55
+ from omegaconf import OmegaConf
56
+ from pit.util import instantiate_from_config
57
+ import torch
58
+
59
+ transform = transforms.Compose([
60
+ transforms.Resize((256,256)),
61
+ transforms.ToTensor(),
62
+ transforms.Normalize(mean=[0.5, 0.5, 0.5],
63
+ std=[0.5, 0.5, 0.5])
64
+ ])
65
+
66
+ img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
67
+ config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
68
+ vae = instantiate_from_config(config.model)
69
+ vae.load_state_dict(
70
+ torch.load("models_256/sd3unet_gq_0.25.ckpt",
71
+ map_location=torch.device('cpu'))["state_dict"],strict=False
72
+ )
73
+ vae = vae.eval().cuda()
74
+
75
+ vae.eval()
76
+ z, log = vae.encode(img, return_reg_log=True)
77
+ img_hat = vae.dequant(log["indices"]) # discrete indices
78
+ img_hat = vae.decode(z) # quantized latent
79
+ ```
80
+
81
+ ### Infer the model as Gaussian VAE
82
+
83
+ * Alternatively, the model can be used as a Vanilla Gaussian VAE:
84
+ ```Python
85
+ from PIL import Image
86
+ from torchvision import transforms
87
+ from omegaconf import OmegaConf
88
+ from pit.util import instantiate_from_config
89
+ import torch
90
+
91
+ transform = transforms.Compose([
92
+ transforms.Resize((256,256)),
93
+ transforms.ToTensor(),
94
+ transforms.Normalize(mean=[0.5, 0.5, 0.5],
95
+ std=[0.5, 0.5, 0.5])
96
+ ])
97
+
98
+ img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
99
+ config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
100
+ vae = instantiate_from_config(config.model)
101
+ vae.load_state_dict(
102
+ torch.load("models_256/sd3unet_gq_0.25.ckpt",
103
+ map_location=torch.device('cpu'))["state_dict"],strict=False
104
+ )
105
+ vae = vae.eval().cuda()
106
+
107
+ vae.eval()
108
+
109
+ z = vae.encode(img, return_reg_log=True)[1]["zhat_noquant"] # Gaussian VAE latents
110
+ img_hat = vae.decode(z)
111
+ ```
112
+
113
+ ## Citation
114
+
115
+ If you find our work helpful or inspiring, please feel free to cite it:
116
+ ```bibtex
117
+ @misc{xu2025vectorquantizationusinggaussian,
118
+ title={Vector Quantization using Gaussian Variational Autoencoder},
119
+ author={Tongda Xu and Wendi Zheng and Jiajun He and Jose Miguel Hernandez-Lobato and Yan Wang and Ya-Qin Zhang and Jie Tang},
120
+ year={2025},
121
+ eprint={2512.06609},
122
+ archivePrefix={arXiv},
123
+ primaryClass={cs.LG},
124
+ url={https://arxiv.org/abs/2512.06609},
125
+ }
126
+ ```