Neon / README.md

Update README.md

06b492b verified 6 months ago

6.36 kB

	---
	license: mit
	pipeline_tag: unconditional-image-generation
	datasets:
	- ILSVRC/imagenet-1k
	- uoft-cs/cifar10
	- marcosv/ffhq-dataset
	---

	# Neon: Negative Extrapolation From Self-Training Improves Image Generation

	This repository contains the official implementation for the paper [Neon: Negative Extrapolation From Self-Training Improves Image Generation](https://huggingface.co/papers/2510.03597).

	Abstract

	Scaling generative AI models is bottlenecked by the scarcity of high-quality training data. The ease of synthesizing from a generative model suggests using (unverified) synthetic data to augment a limited corpus of real data for the purpose of fine-tuning in the hope of improving performance. Unfortunately, however, the resulting positive feedback loop leads to model autophagy disorder (MAD, aka model collapse) that results in a rapid degradation in sample quality and/or diversity. In this paper, we introduce Neon (for Negative Extrapolation frOm self-traiNing), a new learning method that turns the degradation from self-training into a powerful signal for self-improvement. Given a base model, Neon first fine-tunes it on its own self-synthesized data but then, counterintuitively, reverses its gradient updates to extrapolate away from the degraded weights. We prove that Neon works because typical inference samplers that favor high-probability regions create a predictable anti-alignment between the synthetic and real data population gradients, which negative extrapolation corrects to better align the model with the true data distribution. Neon is remarkably easy to implement via a simple post-hoc merge that requires no new real data, works effectively with as few as 1k synthetic samples, and typically uses less than 1% additional training compute. We demonstrate Neon's universality across a range of architectures (diffusion, flow matching, autoregressive, and inductive moment matching models) and datasets (ImageNet, CIFAR-10, and FFHQ). In particular, on ImageNet 256x256, Neon elevates the xAR-L model to a new state-of-the-art FID of 1.02 with only 0.36% additional training compute.

	## Official Resources

	* Paper: [https://huggingface.co/papers/2510.03597](https://huggingface.co/papers/2510.03597)
	* GitHub Repository: [https://github.com/VITA-Group/Neon](https://github.com/VITA-Group/Neon)

	## Method

	![Algorithm 1: Neon — Negative Extrapolation from Self-Training](https://github.com/VITA-Group/Neon/raw/main/assets/algorithm.png)

	In one line: sample with your usual inference to form a synthetic set $S$; briefly fine-tune the reference model on $S$ to get $\theta_s$; then reverse that update with a merge $\theta_{\text{neon}}=(1+w)\,\theta_r - w\,\theta_s$ (small $w>0$), which cancels mode-seeking drift and improves FID.

	## Benchmark Performance

	\| Model type \| Dataset \| Base model FID \| Neon FID (paper) \| Download model \|
	\| ------------- \| ---------------- \| -------------: \| ---------------: \| ------------------------------------------------------------------------------------------------------- \|
	\| xAR-L \| ImageNet-256 \| 1.28 \| 1.02 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_xARL_imagenet256.pth) \|
	\| xAR-B \| ImageNet-256 \| 1.72 \| 1.31 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_xARB_imagenet256.pth) \|
	\| VAR d16 \| ImageNet-256 \| 3.30 \| 2.01 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_VARd16_imagenet256.pth) \|
	\| VAR d36 \| ImageNet-512 \| 2.63 \| 1.70 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_VARd36_imagenet512.pth) \|
	\| EDM (cond.) \| CIFAR-10 (32×32) \| 1.78 \| 1.38 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_EDM_conditional_CIFAR10.pkl) \|
	\| EDM (uncond.) \| CIFAR-10 (32×32) \| 1.98 \| 1.38 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_EDM_unconditional_CIFAR10.pkl) \|
	\| EDM \| FFHQ-64×64 \| 2.39 \| 1.12 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_EDM_FFHQ.pkl) \|
	\| IMM \| ImageNet-256 \| 1.99 \| 1.46 \| [Download](https://huggingface.co/sinaalemohammad/Neon/resolve/main/Neon_imm_imagenet256.pkl) \|

	## Quickstart & Evaluation

	For environment setup, downloading pretrained models, and evaluation scripts (for FID/IS), please refer to the [GitHub repository's Quickstart section](https://github.com/VITA-Group/Neon#quickstart).

	## Repository Map

	```
	Neon/
	├── VAR/ # VAR baselines + eval scripts
	├── xAR/ # xAR baselines + eval scripts (uses MAR VAE)
	├── edm/ # EDM baselines + metrics/scripts
	├── imm/ # IMM baselines + eval scripts
	├── toy_appendix.ipynb # 2D Gaussian toy example (diffusion & AR)
	├── download_models.sh # Grab all checkpoints + FID refs
	├── environment.yml # Reproducible env
	└── checkpoints/, fid_stats/ (created by the script)
	```

	## Citation

	```bibtex
	@article{alemohammadneon2025,
	title = {Neon: Negative Extrapolation From Self-Training Improves Image Generation},
	author = {Alemohammad, Sina and Wang, Zhangyang and Baraniuk, Richard G.},
	journal = {arXiv preprint arXiv:2510.03597},
	year = {2025},
	url = {https://arxiv.org/abs/2510.03597}
	}
	```

	## Contact

	Questions? Reach out to Sina Alemohammad — [sinaalemohammad@gmail.com](mailto:sinaalemohammad@gmail.com).

	## Acknowledgments

	This repository builds upon and thanks the following projects:

	* [VAR — Visual AutoRegressive Modeling](https://github.com/FoundationVision/VAR)
	* [xAR — Beyond Next-Token: Next-X Prediction](https://github.com/OliverRensu/xAR)
	* [IMM — Inductive Moment Matching](https://github.com/lumalabs/imm)
	* [EDM — Elucidating the Design Space of Diffusion Models](https://github.com/NVlabs/edm)
	* [MAR VAE (KL-16) tokenizer](https://huggingface.co/xwen99/mar-vae-kl16)