Improve model card: Add metadata, links, description, and usage

This PR significantly improves the model card for [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609) by:

- Adding the `pipeline_tag: image-to-image` to the metadata for better discoverability and potential inference widget activation.
- Updating the paper link to the official Hugging Face Papers page.
- Including a link to the dedicated project page.
- Adding a concise model description based on the paper's abstract.
- Providing detailed sample usage code snippets (for both VQ-VAE and Gaussian VAE inference) directly from the GitHub repository, making it easier for users to get started.
- Adding the BibTeX citation.

Please review and merge if everything looks good.

Files changed (1) hide show

README.md +123 -2

README.md CHANGED Viewed

@@ -1,5 +1,126 @@
 ---
 license: mit
 ---
-See paper in: [Arxiv](arxiv.org/abs/2512.06609)
-See usage in: [Github](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE)

 ---
 license: mit
+pipeline_tag: image-to-image
 ---
+# Vector Quantization using Gaussian Variational Autoencoder
+This repository contains the official implementation of **Gaussian Quant (GQ)**, a novel method for vector quantization presented in the paper "[Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)".
+GQ proposes a simple yet effective technique that converts a Gaussian Variational Autoencoder (VAE) into a VQ-VAE without the need for additional training. It achieves this by generating random Gaussian noise as a codebook and finding the closest noise to the posterior mean. Theoretically, it's proven that a small quantization error is guaranteed when the logarithm of the codebook size exceeds the bits-back coding rate. Empirically, GQ, combined with a heuristic called target divergence constraint (TDC), outperforms previous VQ-VAEs like VQGAN, FSQ, LFQ, and BSQ on both UNet and ViT architectures.
+-   \ud83d\udcda **Paper on Hugging Face:** [Vector Quantization using Gaussian Variational Autoencoder](https://huggingface.co/papers/2512.06609)
+-   \ud83c\udf10 **Project Page:** [https://tongdaxu.github.io/pages/gq.html](https://tongdaxu.github.io/pages/gq.html)
+-   \ud83d\udcbb **GitHub Repository:** [https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE)
+## Quick Start & Usage
+This section provides a quick guide to installing the necessary dependencies, downloading pre-trained models, and inferring with them. For more details and training instructions, please refer to the [GitHub repository](https://github.com/tongdaxu/VQ-VAE-from-Gaussian-VAE).
+### Install dependency
+*   Install dependencies in `environment.yaml`:
+    ```bash
+    conda env create --file=environment.yaml
+    conda activate tokenizer
+    ```
+### Install this package
+*   From source:
+    ```bash
+    pip install -e .
+    ```
+*   [Optional] CUDA kernel for fast run time:
+    ```bash
+    cd gq_cuda_extension
+    pip install --no-build-isolation -e .
+    ```
+### Download pre-trained model
+*   Download model "sd3unet_gq_0.25.ckpt" from [Huggingface](https://huggingface.co/xutongda/GQModel):
+    ```bash
+    mkdir model_256
+    mv "sd3unet_gq_0.25.ckpt" ./model_256
+    ```
+*   This is a VQ-VAE with `codebook_size=2**16=65536` and `codebook_dim=16`.
+### Infer the model as VQ-VAE
+*   Then use the model as follows:
+    ```Python
+    from PIL import Image
+    from torchvision import transforms
+    from omegaconf import OmegaConf
+    from pit.util import instantiate_from_config
+    import torch
+    transform = transforms.Compose([
+        transforms.Resize((256,256)),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.5, 0.5, 0.5],
+                            std=[0.5, 0.5, 0.5])
+    ])
+    img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
+    config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
+    vae = instantiate_from_config(config.model)
+    vae.load_state_dict(
+        torch.load("models_256/sd3unet_gq_0.25.ckpt",
+            map_location=torch.device('cpu'))["state_dict"],strict=False
+        )
+    vae = vae.eval().cuda()
+    vae.eval()
+    z, log = vae.encode(img, return_reg_log=True)
+    img_hat = vae.dequant(log["indices"]) # discrete indices
+    img_hat = vae.decode(z) # quantized latent
+    ```
+### Infer the model as Gaussian VAE
+*   Alternatively, the model can be used as a Vanilla Gaussian VAE:
+    ```Python
+    from PIL import Image
+    from torchvision import transforms
+    from omegaconf import OmegaConf
+    from pit.util import instantiate_from_config
+    import torch
+    transform = transforms.Compose([
+        transforms.Resize((256,256)),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.5, 0.5, 0.5],
+                            std=[0.5, 0.5, 0.5])
+    ])
+    img = transform(Image.open("demo.png")).unsqueeze(0).cuda()
+    config = OmegaConf.load("./configs/sd3unet_gq_0.25.yaml")
+    vae = instantiate_from_config(config.model)
+    vae.load_state_dict(
+        torch.load("models_256/sd3unet_gq_0.25.ckpt",
+            map_location=torch.device('cpu'))["state_dict"],strict=False
+        )
+    vae = vae.eval().cuda()
+    vae.eval()
+    z = vae.encode(img, return_reg_log=True)[1]["zhat_noquant"] # Gaussian VAE latents
+    img_hat = vae.decode(z)
+    ```
+## Citation
+If you find our work helpful or inspiring, please feel free to cite it:
+```bibtex
+@misc{xu2025vectorquantizationusinggaussian,
+      title={Vector Quantization using Gaussian Variational Autoencoder},
+      author={Tongda Xu and Wendi Zheng and Jiajun He and Jose Miguel Hernandez-Lobato and Yan Wang and Ya-Qin Zhang and Jie Tang},
+      year={2025},
+      eprint={2512.06609},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2512.06609},
+}
+```