PixelGen / README.md

nielsr HF Staff

Improve model card and add metadata

4cdaf81 verified 2 months ago

2.37 kB

license: apache-2.0
pipeline_tag: unconditional-image-generation
tags:
  - image-generation
  - pixel-diffusion

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

PixelGen is a simple pixel diffusion framework that generates images directly in pixel space. Unlike latent diffusion models, it avoids the artifacts and bottlenecks of VAEs by introducing two complementary perceptual losses: an LPIPS loss for local patterns and a DINO-based perceptual loss for global semantics.

Project Page | Paper | GitHub

Introduction

PixelGen achieves competitive results compared to latent diffusion models by modeling a more meaningful perceptual manifold rather than the full, high-dimensional pixel manifold. Key highlights include:

FID 5.11 on ImageNet-256 without classifier-free guidance (CFG) in only 80 epochs.
FID 1.83 on ImageNet-256 with CFG.
GenEval score of 0.79 on large-scale text-to-image generation tasks.

Checkpoints

Dataset	Model	Params	Performance
ImageNet256	PixelGen-XL/16	676M	5.11 FID (w/o CFG) / 1.83 FID (w/ CFG)
Text-to-Image	PixelGen-XXL/16	1.1B	0.79 GenEval Score

Usage

For detailed environment setup and training, please refer to the official GitHub repository.

Inference

You can run inference using the provided configuration files and checkpoints:

# for inference without CFG using 80-epoch checkpoint
python main.py predict -c ./configs_c2i/PixelGen_XL_without_CFG.yaml --ckpt_path=./ckpts/PixelGen_XL_80ep.ckpt

# for inference with CFG using 160-epoch checkpoint
python main.py predict -c ./configs_c2i/PixelGen_XL.yaml --ckpt_path=./ckpts/PixelGen_XL_160ep.ckpt

Citation

@article{ma2026pixelgen,
      title={PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss}, 
      author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
      year={2026},
      eprint={2602.02493},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.02493}, 
}