yikaiwang
/

NVG

Model card Files Files and versions

NVG / README.md

yikaiwang's picture

Create README.md

3259d1a verified 5 months ago

|

history blame contribute delete

2.52 kB

	---
	extra_gated_fields:
	Name: text
	Institute: text
	Institutional Email: text
	I agree to use this model for non-commercial use ONLY: checkbox
	---

	# Model Card for NVG series

	<p align="center">
	<h1 align="center">Next Visual Granularity Generation</h1>
	<center>Yikai Wang, Zhouxia Wang, Zhonghua Wu, Qingyi Tao, Kang Liao, Chen Change Loy.<br>
	S-Lab, Nanyang Technological University; SenseTime Research<br> </center>
	<p align="center">
	<a href="https://arxiv.org/abs/2508.12811"><img alt='arXiv' src="https://img.shields.io/badge/arXiv-2508.12811-b31b1b.svg"></a>
	<a href="https://yikai-wang.github.io/nvg/"><img alt='page' src="https://img.shields.io/badge/Project-Website-orange"></a>
	</p>
	</p>

	## Model Details

	### Model Description

	We propose a novel approach to image generation by decomposing an image into a structured sequence, where each element in the sequence shares the same spatial resolution but differs in the number of unique tokens used, capturing different level of visual granularity.<br>
	Image generation is carried out through our newly introduced Next Visual Granularity (NVG) generation framework, which generates a visual granularity sequence beginning from an empty image and progressively refines it, from global layout to fine details, in a structured manner. This iterative process encodes a hierarchical, layered representation that offers fine-grained control over the generation process across multiple granularity levels.<br>
	We train a series of NVG models for class-conditional image generation on the ImageNet dataset and observe clear scaling behavior. Compared to the VAR series, NVG consistently outperforms it in terms of FID scores (3.30 → 3.03, 2.57 → 2.44, 2.09 → 2.06). We also conduct extensive analysis to showcase the capability and potential of the NVG framework. Our code and models will be released.<br>

	- License: S-Lab License 1.0

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Code: https://github.com/Yikai-Wang/nvg
	- Paper: https://arxiv.org/abs/2508.12811

	## Uses

	Illustrated in the github repo.

	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	```
	@article{wang2025next,
	title={Next Visual Granularity Generation},
	author={Wang, Yikai and Wang, Zhouxia and Wu, Zhonghua and Tao, Qingyi and Liao, Kang and Loy, Chen Change},
	journal={arXiv preprint arXiv:2508.12811},
	year={2025}
	}
	```