Improve model card and add metadata
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,3 +1,54 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: unconditional-image-generation
|
| 4 |
+
tags:
|
| 5 |
+
- image-generation
|
| 6 |
+
- pixel-diffusion
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
|
| 10 |
+
|
| 11 |
+
PixelGen is a simple pixel diffusion framework that generates images directly in pixel space. Unlike latent diffusion models, it avoids the artifacts and bottlenecks of VAEs by introducing two complementary perceptual losses: an LPIPS loss for local patterns and a DINO-based perceptual loss for global semantics.
|
| 12 |
+
|
| 13 |
+
[**Project Page**](https://zehong-ma.github.io/PixelGen/) | [**Paper**](https://huggingface.co/papers/2602.02493) | [**GitHub**](https://github.com/Zehong-Ma/PixelGen)
|
| 14 |
+
|
| 15 |
+
## Introduction
|
| 16 |
+
PixelGen achieves competitive results compared to latent diffusion models by modeling a more meaningful perceptual manifold rather than the full, high-dimensional pixel manifold. Key highlights include:
|
| 17 |
+
- **FID 5.11** on ImageNet-256 without classifier-free guidance (CFG) in only 80 epochs.
|
| 18 |
+
- **FID 1.83** on ImageNet-256 with CFG.
|
| 19 |
+
- **GenEval score of 0.79** on large-scale text-to-image generation tasks.
|
| 20 |
+
|
| 21 |
+
## Checkpoints
|
| 22 |
+
|
| 23 |
+
| Dataset | Model | Params | Performance |
|
| 24 |
+
|---------------|---------------|--------|---------------------------------------|
|
| 25 |
+
| ImageNet256 | PixelGen-XL/16 | 676M | 5.11 FID (w/o CFG) / 1.83 FID (w/ CFG) |
|
| 26 |
+
| Text-to-Image | PixelGen-XXL/16| 1.1B | 0.79 GenEval Score |
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
For detailed environment setup and training, please refer to the [official GitHub repository](https://github.com/Zehong-Ma/PixelGen).
|
| 31 |
+
|
| 32 |
+
### Inference
|
| 33 |
+
You can run inference using the provided configuration files and checkpoints:
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
# for inference without CFG using 80-epoch checkpoint
|
| 37 |
+
python main.py predict -c ./configs_c2i/PixelGen_XL_without_CFG.yaml --ckpt_path=./ckpts/PixelGen_XL_80ep.ckpt
|
| 38 |
+
|
| 39 |
+
# for inference with CFG using 160-epoch checkpoint
|
| 40 |
+
python main.py predict -c ./configs_c2i/PixelGen_XL.yaml --ckpt_path=./ckpts/PixelGen_XL_160ep.ckpt
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
## Citation
|
| 44 |
+
```bibtex
|
| 45 |
+
@article{ma2026pixelgen,
|
| 46 |
+
title={PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss},
|
| 47 |
+
author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
|
| 48 |
+
year={2026},
|
| 49 |
+
eprint={2602.02493},
|
| 50 |
+
archivePrefix={arXiv},
|
| 51 |
+
primaryClass={cs.CV},
|
| 52 |
+
url={https://arxiv.org/abs/2602.02493},
|
| 53 |
+
}
|
| 54 |
+
```
|