Improve model card and add metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: unconditional-image-generation
4
+ tags:
5
+ - image-generation
6
+ - pixel-diffusion
7
+ ---
8
+
9
+ # PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
10
+
11
+ PixelGen is a simple pixel diffusion framework that generates images directly in pixel space. Unlike latent diffusion models, it avoids the artifacts and bottlenecks of VAEs by introducing two complementary perceptual losses: an LPIPS loss for local patterns and a DINO-based perceptual loss for global semantics.
12
+
13
+ [**Project Page**](https://zehong-ma.github.io/PixelGen/) | [**Paper**](https://huggingface.co/papers/2602.02493) | [**GitHub**](https://github.com/Zehong-Ma/PixelGen)
14
+
15
+ ## Introduction
16
+ PixelGen achieves competitive results compared to latent diffusion models by modeling a more meaningful perceptual manifold rather than the full, high-dimensional pixel manifold. Key highlights include:
17
+ - **FID 5.11** on ImageNet-256 without classifier-free guidance (CFG) in only 80 epochs.
18
+ - **FID 1.83** on ImageNet-256 with CFG.
19
+ - **GenEval score of 0.79** on large-scale text-to-image generation tasks.
20
+
21
+ ## Checkpoints
22
+
23
+ | Dataset | Model | Params | Performance |
24
+ |---------------|---------------|--------|---------------------------------------|
25
+ | ImageNet256 | PixelGen-XL/16 | 676M | 5.11 FID (w/o CFG) / 1.83 FID (w/ CFG) |
26
+ | Text-to-Image | PixelGen-XXL/16| 1.1B | 0.79 GenEval Score |
27
+
28
+ ## Usage
29
+
30
+ For detailed environment setup and training, please refer to the [official GitHub repository](https://github.com/Zehong-Ma/PixelGen).
31
+
32
+ ### Inference
33
+ You can run inference using the provided configuration files and checkpoints:
34
+
35
+ ```bash
36
+ # for inference without CFG using 80-epoch checkpoint
37
+ python main.py predict -c ./configs_c2i/PixelGen_XL_without_CFG.yaml --ckpt_path=./ckpts/PixelGen_XL_80ep.ckpt
38
+
39
+ # for inference with CFG using 160-epoch checkpoint
40
+ python main.py predict -c ./configs_c2i/PixelGen_XL.yaml --ckpt_path=./ckpts/PixelGen_XL_160ep.ckpt
41
+ ```
42
+
43
+ ## Citation
44
+ ```bibtex
45
+ @article{ma2026pixelgen,
46
+ title={PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss},
47
+ author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
48
+ year={2026},
49
+ eprint={2602.02493},
50
+ archivePrefix={arXiv},
51
+ primaryClass={cs.CV},
52
+ url={https://arxiv.org/abs/2602.02493},
53
+ }
54
+ ```