MGVQ / README.md

Add metadata tags (#1)

67c1515 verified 5 months ago

6.07 kB

	---
	tags:
	- image-to-image
	- vae
	---

	<div align="center">

	<h1>MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization</h1>

	[![arXiv](https://img.shields.io/badge/ArXiv-2507.07997-%23840707.svg)](https://arxiv.org/abs/2507.07997)


	[Mingkai Jia](https://scholar.google.com/citations?user=fcpTdvcAAAAJ&hl=zh-CN)<sup>1,2</sup>, [Wei Yin](https://yvanyin.net/)<sup>2*§</sup>, [Xiaotao Hu](https://huxiaotaostasy.github.io/)<sup>1,2</sup>, [Jiaxin Guo](https://wrld.github.io/)<sup>3</sup>, [Xiaoyang Guo](https://xy-guo.github.io/)<sup>2</sup><br>
	[Qian Zhang](https://scholar.google.com.hk/citations?hl=zh-CN&user=pCY-bikAAAAJ)<sup>2</sup>, [Xiao-Xiao Long](https://www.xxlong.site/)<sup>4</sup>, [Ping Tan](https://scholar.google.com/citations?user=XhyKVFMAAAAJ&hl=en)<sup>1</sup><br>

	[HKUST](https://hkust.edu.hk/)<sup>1</sup>, [Horizon Robotics](https://en.horizon.auto/)<sup>2</sup>, [CUHK](https://cuhk.edu.hk/)<sup>3</sup>, [NJU](https://www.nju.edu.cn/)<sup>4</sup><br>
	<sup>*</sup> Corresponding Author, <sup>§</sup> Project Leader
	<br><br><image src='https://huggingface.co/mkjia/MGVQ/resolve/main/assets/teaser.png'/>
	</div>


	## 🚀News
	- ```[August 2025]``` Achieve SOTA at paperwithcode leaderboards: Image Reconstruction on ImageNet and UHDBench. <image src='https://huggingface.co/mkjia/MGVQ/raw/main/assets/SOTA_recon_fid_imagenet_badge.jpg'/> <image src='https://huggingface.co/mkjia/MGVQ/raw/main/assets/SOTA_recon_PSNR_UHD_badge.jpg'/>
	- ```[August 2025]``` Released Inference Code
	- ```[August 2025]``` Released [model zoo](https://huggingface.co/mkjia/MGVQ/tree/main).
	- ```[August 2025]``` Released dataset for ultra-high-definition image reconstruction evaluation. Our proposed super-resolution image reconstruction [UHDBench dataset](https://huggingface.co/datasets/mkjia/UHDBench/tree/main) is released.
	- ```[July 2025]``` Released [paper](https://arxiv.org/abs/2507.07997).

	## 🔨TO DO LIST
	- [ ] Training code.
	- [ ] More demos.
	- [x] Models & Evaluation code.
	- [x] Huggingface models.
	- [x] Release zero-shot reconstruction benchmarks.

	## 🙈 Model Zoo
	\| Model \| Downsample \| Groups \| Codebook Size \| Training Data \| Link \|
	\|---\|---\|---\|---\|---\|---\|
	\|mgvq-f8c32-g4\|8\|4\|32768\|imagenet\| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f8c32_g4.pt) \|
	\|mgvq-f8c32-g8\|8\|8\|16384\|imagenet\| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f8c32_g8.pt) \|
	\|mgvq-f16c32-g4\|16\|4\|32768\|imagenet\| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g4.pt) \|
	\|mgvq-f16c32-g8\|16\|8\|16384\|imagenet\| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g8.pt) \|
	\|mgvq-f16c32-g4-mix\|16\|4\|32768\|mix\| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g4_mix.pt) \|
	\|mgvq-f32c32-g8-mix\|32\|8\|16384\|mix\| [link](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f32c32_g8_mix.pt) \|

	## 🔑 Quick Start
	<a id="quick start"></a>

	### Installation

	```bash
	git clone https://github.com/MKJia/MGVQ.git
	cd MGVQ
	pip3 install requirements.txt
	```

	### Download models
	Download the pretrained models from our [model zoo](https://huggingface.co/mkjia/MGVQ/tree/main) to your `/path/to/your/ckpt`.

	### Data Preparation
	Try our UHDBench dataset on [huggingface](https://huggingface.co/datasets/mkjia/UHDBench/tree/main) and download to your `/path/to/your/dataset`.

	### Evaluation on Reconstruction
	Remember to change the paths of `ckpt` and `dataset_root`, and make sure you are evaluating the expected `model` on `dataset`.
	```bash
	cd evaluation
	python3 eval_recon.sh
	```

	### Generation Demo&Evaluation
	You can download the pretrained GPT model for generation on [huggingface](https://huggingface.co/datasets/mkjia/MGVQ/blob/main/MGVQ_GPT_XXL.pt), and test it with our `mgvq-f16c32-g4` [tokenizer model](https://huggingface.co/mkjia/MGVQ/blob/main/mgvq_f16c32_g4.pt) for demo image sampling. Remember to change the paths of `gpt_ckpt` and `vq_ckpt`.
	```
	cd evaluation
	python3 demo_gen.sh
	```
	We also provide our .npz file on [huggingface](https://huggingface.co/datasets/mkjia/MGVQ/blob/main/GPT_XXL_300ep_topk_12.npz) sampled by `sample_c2i_ddp.py` for evaluation.
	```
	cd evaluation
	python3 evaluator.py /path/to/your/VIRTUAL_imagenet256_labeled.npz /path/to/your/GPT_XXL_300ep_topk_12.npz
	```


	## 🗄️Demos
	- 🔥 Qualitative reconstruction images with $16$ x downsampling on $2560$ x $1440$ UHDBench dataset.

	<image src='https://huggingface.co/mkjia/MGVQ/resolve/main/assets/qual_recon.png'/>

	- 🔥 Qualitative class-to-image generation of Imagenet. The classes are dog(Golden Retriever and Husky), cliff, and bald eagle.

	<image src='https://huggingface.co/mkjia/MGVQ/resolve/main/assets/qual_gen.png'/>

	- 🔥 Reconstruction evaluation on 256×256 ImageNet benchmark.

	<image src='https://huggingface.co/mkjia/MGVQ/resolve/main/assets/recon_tab_1.jpg'/>

	- 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 512×512 datasets.

	<image src='https://huggingface.co/mkjia/MGVQ/resolve/main/assets/recon_tab_2.jpg'/>

	- 🔥 Zero-shot reconstruction evaluation with a downsample ratio of 16 on 2560×1440 datasets.

	<div align="center"><image src='https://huggingface.co/mkjia/MGVQ/resolve/main/assets/recon_tab_3.jpg'/></image></div>

	## 🗄️Demos

	## 📌 Citation

	If the paper and code from `MGVQ` help your research, we kindly ask you to give a citation to our paper ❤️. Additionally, if you appreciate our work and find this repository useful, giving it a star ⭐️ would be a wonderful way to support our work. Thank you very much.

	```bibtex
	@article{jia2025mgvq,
	title={MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization},
	author={Jia, Mingkai and Yin, Wei and Hu, Xiaotao and Guo, Jiaxin and Guo, Xiaoyang and Zhang, Qian and Long, Xiao-Xiao and Tan, Ping},
	journal={arXiv preprint arXiv:2507.07997},
	year={2025}
	}
	```

	## License

	This repository is under the MIT License. For more license questions, please contact Mingkai Jia (mjiaab@connect.ust.hk) and Wei Yin (yvanwy@outlook.com).