nielsr HF Staff commited on
Commit
4cdaf81
·
verified ·
1 Parent(s): 3e55c85

Improve model card and add metadata

Browse files

Hi! I'm Niels from the community science team at Hugging Face. I've updated the model card to include links to the paper, project page, and official GitHub repository. I also added the `pipeline_tag` and additional metadata to improve the model's discoverability on the Hub. I've also included a brief description of the framework, a checkpoint summary, and inference usage examples derived from the official documentation.

Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: unconditional-image-generation
4
+ tags:
5
+ - image-generation
6
+ - pixel-diffusion
7
+ ---
8
+
9
+ # PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
10
+
11
+ PixelGen is a simple pixel diffusion framework that generates images directly in pixel space. Unlike latent diffusion models, it avoids the artifacts and bottlenecks of VAEs by introducing two complementary perceptual losses: an LPIPS loss for local patterns and a DINO-based perceptual loss for global semantics.
12
+
13
+ [**Project Page**](https://zehong-ma.github.io/PixelGen/) | [**Paper**](https://huggingface.co/papers/2602.02493) | [**GitHub**](https://github.com/Zehong-Ma/PixelGen)
14
+
15
+ ## Introduction
16
+ PixelGen achieves competitive results compared to latent diffusion models by modeling a more meaningful perceptual manifold rather than the full, high-dimensional pixel manifold. Key highlights include:
17
+ - **FID 5.11** on ImageNet-256 without classifier-free guidance (CFG) in only 80 epochs.
18
+ - **FID 1.83** on ImageNet-256 with CFG.
19
+ - **GenEval score of 0.79** on large-scale text-to-image generation tasks.
20
+
21
+ ## Checkpoints
22
+
23
+ | Dataset | Model | Params | Performance |
24
+ |---------------|---------------|--------|---------------------------------------|
25
+ | ImageNet256 | PixelGen-XL/16 | 676M | 5.11 FID (w/o CFG) / 1.83 FID (w/ CFG) |
26
+ | Text-to-Image | PixelGen-XXL/16| 1.1B | 0.79 GenEval Score |
27
+
28
+ ## Usage
29
+
30
+ For detailed environment setup and training, please refer to the [official GitHub repository](https://github.com/Zehong-Ma/PixelGen).
31
+
32
+ ### Inference
33
+ You can run inference using the provided configuration files and checkpoints:
34
+
35
+ ```bash
36
+ # for inference without CFG using 80-epoch checkpoint
37
+ python main.py predict -c ./configs_c2i/PixelGen_XL_without_CFG.yaml --ckpt_path=./ckpts/PixelGen_XL_80ep.ckpt
38
+
39
+ # for inference with CFG using 160-epoch checkpoint
40
+ python main.py predict -c ./configs_c2i/PixelGen_XL.yaml --ckpt_path=./ckpts/PixelGen_XL_160ep.ckpt
41
+ ```
42
+
43
+ ## Citation
44
+ ```bibtex
45
+ @article{ma2026pixelgen,
46
+ title={PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss},
47
+ author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
48
+ year={2026},
49
+ eprint={2602.02493},
50
+ archivePrefix={arXiv},
51
+ primaryClass={cs.CV},
52
+ url={https://arxiv.org/abs/2602.02493},
53
+ }
54
+ ```