Improve model card and add metadata
Browse filesHi! I'm Niels from the community science team at Hugging Face. I've updated the model card to include links to the paper, project page, and official GitHub repository. I also added the `pipeline_tag` and additional metadata to improve the model's discoverability on the Hub. I've also included a brief description of the framework, a checkpoint summary, and inference usage examples derived from the official documentation.
README.md
CHANGED
|
@@ -1,3 +1,54 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
--
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: unconditional-image-generation
|
| 4 |
+
tags:
|
| 5 |
+
- image-generation
|
| 6 |
+
- pixel-diffusion
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
|
| 10 |
+
|
| 11 |
+
PixelGen is a simple pixel diffusion framework that generates images directly in pixel space. Unlike latent diffusion models, it avoids the artifacts and bottlenecks of VAEs by introducing two complementary perceptual losses: an LPIPS loss for local patterns and a DINO-based perceptual loss for global semantics.
|
| 12 |
+
|
| 13 |
+
[**Project Page**](https://zehong-ma.github.io/PixelGen/) | [**Paper**](https://huggingface.co/papers/2602.02493) | [**GitHub**](https://github.com/Zehong-Ma/PixelGen)
|
| 14 |
+
|
| 15 |
+
## Introduction
|
| 16 |
+
PixelGen achieves competitive results compared to latent diffusion models by modeling a more meaningful perceptual manifold rather than the full, high-dimensional pixel manifold. Key highlights include:
|
| 17 |
+
- **FID 5.11** on ImageNet-256 without classifier-free guidance (CFG) in only 80 epochs.
|
| 18 |
+
- **FID 1.83** on ImageNet-256 with CFG.
|
| 19 |
+
- **GenEval score of 0.79** on large-scale text-to-image generation tasks.
|
| 20 |
+
|
| 21 |
+
## Checkpoints
|
| 22 |
+
|
| 23 |
+
| Dataset | Model | Params | Performance |
|
| 24 |
+
|---------------|---------------|--------|---------------------------------------|
|
| 25 |
+
| ImageNet256 | PixelGen-XL/16 | 676M | 5.11 FID (w/o CFG) / 1.83 FID (w/ CFG) |
|
| 26 |
+
| Text-to-Image | PixelGen-XXL/16| 1.1B | 0.79 GenEval Score |
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
For detailed environment setup and training, please refer to the [official GitHub repository](https://github.com/Zehong-Ma/PixelGen).
|
| 31 |
+
|
| 32 |
+
### Inference
|
| 33 |
+
You can run inference using the provided configuration files and checkpoints:
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
# for inference without CFG using 80-epoch checkpoint
|
| 37 |
+
python main.py predict -c ./configs_c2i/PixelGen_XL_without_CFG.yaml --ckpt_path=./ckpts/PixelGen_XL_80ep.ckpt
|
| 38 |
+
|
| 39 |
+
# for inference with CFG using 160-epoch checkpoint
|
| 40 |
+
python main.py predict -c ./configs_c2i/PixelGen_XL.yaml --ckpt_path=./ckpts/PixelGen_XL_160ep.ckpt
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
## Citation
|
| 44 |
+
```bibtex
|
| 45 |
+
@article{ma2026pixelgen,
|
| 46 |
+
title={PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss},
|
| 47 |
+
author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
|
| 48 |
+
year={2026},
|
| 49 |
+
eprint={2602.02493},
|
| 50 |
+
archivePrefix={arXiv},
|
| 51 |
+
primaryClass={cs.CV},
|
| 52 |
+
url={https://arxiv.org/abs/2602.02493},
|
| 53 |
+
}
|
| 54 |
+
```
|