---
license: apache-2.0
language:
  - en
library_name: diffusers
tags:
  - diffusers
  - image-generation
  - class-conditional
  - nit
pipeline_tag: unconditional-image-generation
widget:
  - output:
      url: demo_images/demo_sde250_class207_seed42.png
---

# NiT-XL Diffusers (Class-Conditional)

Native-resolution Image Transformer (NiT-XL) checkpoint packaged as a Diffusers-style repository with vendored custom code.

## What is included

- `transformer/`: `NiTTransformer2DModel` weights + config
- `scheduler/`: `NiTFlowMatchScheduler` config
- `vae/`: `AutoencoderDC` weights + config
- `custom_pipeline/`: local, self-contained implementation for:
  - `NiTPipeline`
  - `NiTTransformer2DModel`
  - `NiTFlowMatchScheduler`
- `test_inference.py`: standalone sampling script

This repository does **not** depend on an external `NiT-diffusers` checkout during inference.
It includes a root `pipeline.py` custom entrypoint for Diffusers dynamic loading.

## Quickstart

### 1) Environment

Install dependencies (example):

```bash
pip install torch diffusers safetensors
```

If using this project environment:

```bash
conda activate rsgen
```

### 2) Generate a demo image

Run from this repository root:

```bash
python test_inference.py \
  --class-label 207 \
  --height 512 \
  --width 512 \
  --steps 250 \
  --mode sde \
  --guidance-scale 2.05 \
  --guidance-low 0.0 \
  --guidance-high 0.7 \
  --output demo_images/demo_sde250_class207_seed42.png
```

## Python usage

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path(".").resolve()
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" and torch.cuda.is_bf16_supported() else torch.float32

pipe = DiffusionPipeline.from_pretrained(
    model_dir,
    custom_pipeline=str(model_dir / "pipeline.py"),
    local_files_only=True,
).to(device)
if device == "cuda":
    pipe.transformer.to(dtype=dtype)
    pipe.vae.to(dtype=dtype)

gen = torch.Generator(device=device).manual_seed(42)
result = pipe(
    class_labels=[207],
    height=512,
    width=512,
    num_inference_steps=250,
    mode="sde",
    guidance_scale=2.05,
    guidance_interval=(0.0, 0.7),
    generator=gen,
)
result.images[0].save("demo_images/sample.png")
```

For remote Hub loading:

```python
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/NiT-XL-diffusers",
    custom_pipeline="pipeline",
)
```

## Recommended inference settings

- Resolution: `512x512`
- Mode: `sde`
- Steps: `250`
- Guidance scale: `2.05`
- Guidance interval: `(0.0, 0.7)`

Using very low steps (for example `2`) is only a smoke test and will produce low-quality images.

## Demo

![NiT-XL demo image](demo_images/demo_sde250_class207_seed42.png)

## Citation

If you use this model or the NiT method in your work, please cite:

```bibtex
@article{wang2025native,
  title={Native-Resolution Image Synthesis},
  author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
  year={2025},
  eprint={2506.03131},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```

## Notes

- This is a class-conditional generator (ImageNet label ids), not a text-to-image model.
- For reproducibility, set `--seed`.
- The vendored custom pipeline keeps inference behavior consistent without external code dependencies.