File size: 2,732 Bytes
c7626bd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: mit
library_name: diffusers
tags:
  - diffusers
  - image-generation
  - class-conditional
  - imagenet
  - pixnerd
language:
  - en
---

# PixNerd-XL-16 Diffusers Checkpoints

Production-ready Diffusers export of PixNerd-XL/16 class-conditional ImageNet checkpoints.

## Available Checkpoints

- `PixNerd-XL-16-256`
  - source: `epoch%3D319-step%3D1600000_emainit.ckpt`
  - target resolution: `256x256`
- `PixNerd-XL-16-512`
  - source: `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt`
  - target resolution: `512x512`

Both checkpoints are packaged with:

- `pipeline.py`
- `modeling_pixnerd_transformer_2d.py`
- `scheduling_pixnerd_flow_match.py`
- `transformer/` weights + config
- `scheduler/` config

## Requirements

```bash
pip install torch diffusers
```

## Inference (Python)

```python
import torch
from diffusers import DiffusionPipeline

model_dir = "PixNerd-XL-16-256"  # or PixNerd-XL-16-512
pipe = DiffusionPipeline.from_pretrained(
    model_dir,
    custom_pipeline=f"{model_dir}/pipeline.py",
    torch_dtype=torch.float32,
).to("cpu")  # use "cuda" if available

# Class-conditional generation: class label 207 (golden retriever)
images = pipe(
    prompt=[207],
    num_images_per_prompt=1,
    height=256,
    width=256,
    num_inference_steps=25,
    guidance_scale=4.0,
    timeshift=3.0,
    order=2,
).images

images[0].save("sample.png")
```

## Interface Notes

- The pipeline uses `prompt` for conditioning input.
- For class-conditional generation, pass integer labels, e.g. `prompt=[207]`.
- `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size.

## Reproducibility Metadata

- Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`.
- Transformer and scheduler runtime classes are defined in repository-local Python modules shipped with each checkpoint.

## Limitations

- Intended for ImageNet class-conditional generation.
- No text encoder is included.
- Output quality depends on scheduler settings and inference step count.

## Citation

Source paper (ICLR 2026):

- [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268)
- [Hugging Face Papers page](https://huggingface.co/papers/2507.23268)

Source code:

- Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd)
- Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers)

```bibtex
@article{2507.23268,
  Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang},
  Title = {PixNerd: Pixel Neural Field Diffusion},
  Year = {2025},
  Eprint = {arXiv:2507.23268},
}
```