--- tags: - image-to-image - vae ---

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

[![arXiv](https://img.shields.io/badge/ArXiv-2507.07997-%23840707.svg)](https://arxiv.org/abs/2507.07997) [Mingkai Jia](https://scholar.google.com/citations?user=fcpTdvcAAAAJ&hl=zh-CN)^1,2, [Wei Yin](https://yvanyin.net/)^2*§, [Xiaotao Hu](https://huxiaotaostasy.github.io/)^1,2, [Jiaxin Guo](https://wrld.github.io/)³, [Xiaoyang Guo](https://xy-guo.github.io/)²
[Qian Zhang](https://scholar.google.com.hk/citations?hl=zh-CN&user=pCY-bikAAAAJ)², [Xiao-Xiao Long](https://www.xxlong.site/)⁴, [Ping Tan](https://scholar.google.com/citations?user=XhyKVFMAAAAJ&hl=en)¹
[HKUST](https://hkust.edu.hk/)¹, [Horizon Robotics](https://en.horizon.auto/)², [CUHK](https://cuhk.edu.hk/)³, [NJU](https://www.nju.edu.cn/)⁴
^* Corresponding Author, ^§ Project Leader

## 🚀News - ```[August 2025]``` Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench.

- ```[August 2025]``` Released Inference Code - ```[August 2025]``` Released [model zoo](https://huggingface.co/mkjia/MGVQ/tree/main). - ```[August 2025]``` Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction [UHDBench dataset](https://huggingface.co/datasets/mkjia/UHDBench/tree/main) is released. - ```[July 2025]``` Released [paper](https://arxiv.org/abs/2507.07997). ## 🔨TO DO LIST - [ ] Training code. - [ ] More demos. - [x] Models & Evaluation code. - [x] Huggingface models. - [x] Release zero-shot reconstruction benchmarks. ## 🙈 Model Zoo | Model | Downsample | Groups | Codebook Size | Training Data | Link | |---|---|---|---|---|---| |mgvq-f8c32-g4|8|4|32768|imagenet| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f8c32_g4.pt) | |mgvq-f8c32-g8|8|8|16384|imagenet| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f8c32_g8.pt) | |mgvq-f16c32-g4|16|4|32768|imagenet| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g4.pt) | |mgvq-f16c32-g8|16|8|16384|imagenet| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g8.pt) | |mgvq-f16c32-g4-mix|16|4|32768|mix| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g4_mix.pt) | |mgvq-f32c32-g8-mix|32|8|16384|mix| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f32c32_g8_mix.pt) | ## 🔑 Quick Start ### Installation ```bash git clone https://github.com/MKJia/MGVQ.git cd MGVQ pip3 install requirements.txt ``` ### Download models Download the pretrained models from our [model zoo](https://huggingface.co/mkjia/MGVQ/tree/main) to your `/path/to/your/ckpt`. ### Data Preparation Try our UHDBench dataset on [huggingface](https://huggingface.co/datasets/mkjia/UHDBench/tree/main) and download to your `/path/to/your/dataset`. ### Evaluation on Reconstruction Remember to change the paths of `ckpt` and `dataset_root`, and make sure you are evaluating the expected `model` on `dataset`. ```bash cd evaluation python3 eval_recon.sh ``` ### Generation Demo&Evaluation You can download the pretrained GPT model for generation on [huggingface](https://huggingface.co/datasets/mkjia/MGVQ/blob/main/MGVQ_GPT_XXL.pt), and test it with our `mgvq-f16c32-g4` [tokenizer model](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g4.pt) for demo image sampling. Remember to change the paths of `gpt_ckpt` and `vq_ckpt`. ``` cd evaluation python3 demo_gen.sh ``` We also provide our .npz file on [huggingface](https://huggingface.co/datasets/mkjia/MGVQ/blob/main/GPT_XXL_300ep_topk_12.npz) sampled by `sample_c2i_ddp.py` for evaluation. ``` cd evaluation python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz ``` ## 🗄️Demos - 🔥 Qualitative reconstruction images with $16$ x downsampling on $2560$ x $1440$ UHDBench dataset.

- 🔥 Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.

- 🔥 Reconstruction evaluation on 256×256 ImageNet benchmark.

- 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512×512 datasets.

- 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560×1440 datasets.

## 🗄️Demos ## 📌 Citation If the paper and code from `MGVQ` help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much. ```bibtex @article{jia2025mgvq, title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization}, author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping}, journal={arXiv preprint arXiv:2507.07997}, year={2025} } ``` ## License This repository is under the MIT License. For more license questions, please contact Mingkai Jia (mjiaab@connect.ust.hk) and Wei Yin (yvanwy@outlook.com).