| --- |
| license: mit |
| library_name: diffusers |
| tags: |
| - diffusers |
| - image-generation |
| - class-conditional |
| - imagenet |
| - pixnerd |
| language: |
| - en |
| --- |
| |
| # PixNerd-XL-16 Diffusers Checkpoints |
|
|
| Production-ready Diffusers export of PixNerd-XL/16 class-conditional ImageNet checkpoints. |
|
|
| ## Available Checkpoints |
|
|
| - `PixNerd-XL-16-256` |
| - source: `epoch%3D319-step%3D1600000_emainit.ckpt` |
| - target resolution: `256x256` |
| - `PixNerd-XL-16-512` |
| - source: `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt` |
| - target resolution: `512x512` |
|
|
| Both checkpoints are packaged with: |
|
|
| - `pipeline.py` |
| - `modeling_pixnerd_transformer_2d.py` |
| - `scheduling_pixnerd_flow_match.py` |
| - `transformer/` weights + config |
| - `scheduler/` config |
|
|
| ## Requirements |
|
|
| ```bash |
| pip install torch diffusers |
| ``` |
|
|
| ## Inference (Python) |
|
|
| ```python |
| import torch |
| from diffusers import DiffusionPipeline |
| |
| model_dir = "PixNerd-XL-16-256" # or PixNerd-XL-16-512 |
| pipe = DiffusionPipeline.from_pretrained( |
| model_dir, |
| custom_pipeline=f"{model_dir}/pipeline.py", |
| torch_dtype=torch.float32, |
| ).to("cpu") # use "cuda" if available |
| |
| # Class-conditional generation: class label 207 (golden retriever) |
| images = pipe( |
| prompt=[207], |
| num_images_per_prompt=1, |
| height=256, |
| width=256, |
| num_inference_steps=25, |
| guidance_scale=4.0, |
| timeshift=3.0, |
| order=2, |
| ).images |
| |
| images[0].save("sample.png") |
| ``` |
|
|
| ## Interface Notes |
|
|
| - The pipeline uses `prompt` for conditioning input. |
| - For class-conditional generation, pass integer labels, e.g. `prompt=[207]`. |
| - `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size. |
|
|
| ## Reproducibility Metadata |
|
|
| - Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`. |
| - Transformer and scheduler runtime classes are defined in repository-local Python modules shipped with each checkpoint. |
|
|
| ## Limitations |
|
|
| - Intended for ImageNet class-conditional generation. |
| - No text encoder is included. |
| - Output quality depends on scheduler settings and inference step count. |
|
|
| ## Citation |
|
|
| Source paper (ICLR 2026): |
|
|
| - [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268) |
| - [Hugging Face Papers page](https://huggingface.co/papers/2507.23268) |
|
|
| Source code: |
|
|
| - Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd) |
| - Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers) |
|
|
| ```bibtex |
| @article{2507.23268, |
| Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang}, |
| Title = {PixNerd: Pixel Neural Field Diffusion}, |
| Year = {2025}, |
| Eprint = {arXiv:2507.23268}, |
| } |
| ``` |
|
|