--- license: mit library_name: diffusers tags: - diffusers - image-generation - class-conditional - imagenet - pixnerd language: - en --- # PixNerd-XL-16 Diffusers Checkpoints Production-ready Diffusers export of PixNerd-XL/16 class-conditional ImageNet checkpoints. ## Available Checkpoints - `PixNerd-XL-16-256` - source: `epoch%3D319-step%3D1600000_emainit.ckpt` - target resolution: `256x256` - `PixNerd-XL-16-512` - source: `res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt` - target resolution: `512x512` Both checkpoints are packaged with: - `pipeline.py` - `modeling_pixnerd_transformer_2d.py` - `scheduling_pixnerd_flow_match.py` - `transformer/` weights + config - `scheduler/` config ## Requirements ```bash pip install torch diffusers ``` ## Inference (Python) ```python import torch from diffusers import DiffusionPipeline model_dir = "PixNerd-XL-16-256" # or PixNerd-XL-16-512 pipe = DiffusionPipeline.from_pretrained( model_dir, custom_pipeline=f"{model_dir}/pipeline.py", torch_dtype=torch.float32, ).to("cpu") # use "cuda" if available # Class-conditional generation: class label 207 (golden retriever) images = pipe( prompt=[207], num_images_per_prompt=1, height=256, width=256, num_inference_steps=25, guidance_scale=4.0, timeshift=3.0, order=2, ).images images[0].save("sample.png") ``` ## Interface Notes - The pipeline uses `prompt` for conditioning input. - For class-conditional generation, pass integer labels, e.g. `prompt=[207]`. - `height` and `width` should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size. ## Reproducibility Metadata - Architecture and conversion provenance are recorded in each checkpoint's `conversion_metadata.json`. - Transformer and scheduler runtime classes are defined in repository-local Python modules shipped with each checkpoint. ## Limitations - Intended for ImageNet class-conditional generation. - No text encoder is included. - Output quality depends on scheduler settings and inference step count. ## Citation Source paper (ICLR 2026): - [PixNerd: Pixel Neural Field Diffusion](http://arxiv.org/abs/2507.23268) - [Hugging Face Papers page](https://huggingface.co/papers/2507.23268) Source code: - Original PixNerd codebase: [MCG-NJU/PixNerd](https://github.com/MCG-NJU/PixNerd) - Diffusers conversion code used for this export: [Bili-Sakura/PixNerd-diffusers](https://github.com/Bili-Sakura/PixNerd-diffusers) ```bibtex @article{2507.23268, Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang}, Title = {PixNerd: Pixel Neural Field Diffusion}, Year = {2025}, Eprint = {arXiv:2507.23268}, } ```