BiliSakura
/

PixNerd-diffusers

image-generation

class-conditional

Model card Files Files and versions

PixNerd-diffusers / README.md

BiliSakura's picture

Upload folder using huggingface_hub

c7626bd verified 2 days ago

|

history blame contribute delete

2.73 kB

	---
	license: mit
	library_name: diffusers
	tags:
	- diffusers
	- image-generation
	- class-conditional
	- imagenet
	- pixnerd
	language:
	- en
	---

	# PixNerd-XL-16 Diffusers Checkpoints

	Production-ready Diffusers export of PixNerd-XL/16 class-conditional ImageNet checkpoints.

	## Available Checkpoints

	- `PixNerd-XL-16-256`
	- source: `epoch%3D319-step%3D1600000_emainit.ckpt`
	- target resolution: `256x256`
	- `PixNerd-XL-16-512`
	- source: `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt`
	- target resolution: `512x512`

	Both checkpoints are packaged with:

	- `pipeline.py`
	- `modeling_pixnerd_transformer_2d.py`
	- `scheduling_pixnerd_flow_match.py`
	- `transformer/` weights + config
	- `scheduler/` config

	## Requirements

	```bash
	pip install torch diffusers
	```

	## Inference (Python)

	```python
	import torch
	from diffusers import DiffusionPipeline

	model_dir = "PixNerd-XL-16-256" # or PixNerd-XL-16-512
	pipe = DiffusionPipeline.from_pretrained(
	model_dir,
	custom_pipeline=f"{model_dir}/pipeline.py",
	torch_dtype=torch.float32,
	).to("cpu") # use "cuda" if available

	# Class-conditional generation: class label 207 (golden retriever)
	images = pipe(
	prompt=[207],
	num_images_per_prompt=1,
	height=256,
	width=256,
	num_inference_steps=25,
	guidance_scale=4.0,
	timeshift=3.0,
	order=2,
	).images

	images[0].save("sample.png")
	```

	## Interface Notes

	- The pipeline uses `prompt` for conditioning input.
	- For class-conditional generation, pass integer labels, e.g. `prompt=[207]`.
	- `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size.

	## Reproducibility Metadata

	- Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`.
	- Transformer and scheduler runtime classes are defined in repository-local Python modules shipped with each checkpoint.

	## Limitations

	- Intended for ImageNet class-conditional generation.
	- No text encoder is included.
	- Output quality depends on scheduler settings and inference step count.

	## Citation

	Source paper (ICLR 2026):

	- [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268)
	- [Hugging Face Papers page](https://huggingface.co/papers/2507.23268)

	Source code:

	- Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd)
	- Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers)

	```bibtex
	@article{2507.23268,
	Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang},
	Title = {PixNerd: Pixel Neural Field Diffusion},
	Year = {2025},
	Eprint = {arXiv:2507.23268},
	}
	```