Upload folder using huggingface_hub

c7626bd verified about 12 hours ago

2.73 kB

license: mit
library_name: diffusers
tags:
  - diffusers
  - image-generation
  - class-conditional
  - imagenet
  - pixnerd
language:
  - en

PixNerd-XL-16 Diffusers Checkpoints

Production-ready Diffusers export of PixNerd-XL/16 class-conditional ImageNet checkpoints.

Available Checkpoints

PixNerd-XL-16-256
- source: epoch%3D319-step%3D1600000_emainit.ckpt
- target resolution: 256x256
PixNerd-XL-16-512
- source: res512_ft200k_epoch%3D325-step%3D1800000_emainit.ckpt
- target resolution: 512x512

Both checkpoints are packaged with:

pipeline.py
modeling_pixnerd_transformer_2d.py
scheduling_pixnerd_flow_match.py
transformer/ weights + config
scheduler/ config

Requirements

pip install torch diffusers

Inference (Python)

import torch
from diffusers import DiffusionPipeline

model_dir = "PixNerd-XL-16-256"  # or PixNerd-XL-16-512
pipe = DiffusionPipeline.from_pretrained(
    model_dir,
    custom_pipeline=f"{model_dir}/pipeline.py",
    torch_dtype=torch.float32,
).to("cpu")  # use "cuda" if available

# Class-conditional generation: class label 207 (golden retriever)
images = pipe(
    prompt=[207],
    num_images_per_prompt=1,
    height=256,
    width=256,
    num_inference_steps=25,
    guidance_scale=4.0,
    timeshift=3.0,
    order=2,
).images

images[0].save("sample.png")

Interface Notes

The pipeline uses prompt for conditioning input.
For class-conditional generation, pass integer labels, e.g. prompt=[207].
height and width should match checkpoint intent (256 or 512), but custom sizes work if divisible by patch size.

Reproducibility Metadata

Architecture and conversion provenance are recorded in each checkpoint's conversion_metadata.json.
Transformer and scheduler runtime classes are defined in repository-local Python modules shipped with each checkpoint.

Limitations

Intended for ImageNet class-conditional generation.
No text encoder is included.
Output quality depends on scheduler settings and inference step count.

Citation

Source paper (ICLR 2026):

Source code:

Original PixNerd codebase: MCG-NJU/PixNerd
Diffusers conversion code used for this export: Bili-Sakura/PixNerd-diffusers

@article{2507.23268,
  Author = {Shuai Wang and Ziteng Gao and Chenhui Zhu and Weilin Huang and Limin Wang},
  Title = {PixNerd: Pixel Neural Field Diffusion},
  Year = {2025},
  Eprint = {arXiv:2507.23268},
}