zehongma
/

PixelGen

Model card Files Files and versions

xet

Community

Improve model card and add metadata

by nielsr HF Staff - opened Feb 3

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+54

-3

Files changed (1) hide show

README.md +54 -3

README.md CHANGED Viewed

@@ -1,3 +1,54 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: unconditional-image-generation
+tags:
+- image-generation
+- pixel-diffusion
+---
+# PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
+PixelGen is a simple pixel diffusion framework that generates images directly in pixel space. Unlike latent diffusion models, it avoids the artifacts and bottlenecks of VAEs by introducing two complementary perceptual losses: an LPIPS loss for local patterns and a DINO-based perceptual loss for global semantics.
+[**Project Page**](https://zehong-ma.github.io/PixelGen/) | [**Paper**](https://huggingface.co/papers/2602.02493) | [**GitHub**](https://github.com/Zehong-Ma/PixelGen)
+## Introduction
+PixelGen achieves competitive results compared to latent diffusion models by modeling a more meaningful perceptual manifold rather than the full, high-dimensional pixel manifold. Key highlights include:
+- **FID 5.11** on ImageNet-256 without classifier-free guidance (CFG) in only 80 epochs.
+- **FID 1.83** on ImageNet-256 with CFG.
+- **GenEval score of 0.79** on large-scale text-to-image generation tasks.
+## Checkpoints
+| Dataset       | Model         | Params | Performance                           |
+|---------------|---------------|--------|---------------------------------------|
+| ImageNet256   | PixelGen-XL/16 | 676M   | 5.11 FID (w/o CFG) / 1.83 FID (w/ CFG) |
+| Text-to-Image | PixelGen-XXL/16| 1.1B   | 0.79 GenEval Score                    |
+## Usage
+For detailed environment setup and training, please refer to the [official GitHub repository](https://github.com/Zehong-Ma/PixelGen).
+### Inference
+You can run inference using the provided configuration files and checkpoints:
+```bash
+# for inference without CFG using 80-epoch checkpoint
+python main.py predict -c ./configs_c2i/PixelGen_XL_without_CFG.yaml --ckpt_path=./ckpts/PixelGen_XL_80ep.ckpt
+# for inference with CFG using 160-epoch checkpoint
+python main.py predict -c ./configs_c2i/PixelGen_XL.yaml --ckpt_path=./ckpts/PixelGen_XL_160ep.ckpt
+```
+## Citation
+```bibtex
+@article{ma2026pixelgen,
+      title={PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss},
+      author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
+      year={2026},
+      eprint={2602.02493},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2602.02493},
+}
+```