NiT-XL-diffusers / README.md
BiliSakura's picture
Update README.md
4510c0f verified
---
license: apache-2.0
language:
- en
library_name: diffusers
tags:
- diffusers
- image-generation
- class-conditional
- nit
pipeline_tag: unconditional-image-generation
widget:
- output:
url: demo_images/demo_sde250_class207_seed42.png
---
# NiT-XL Diffusers (Class-Conditional)
Native-resolution Image Transformer (NiT-XL) checkpoint packaged as a Diffusers-style repository with vendored custom code.
## What is included
- `transformer/`: `NiTTransformer2DModel` weights + config
- `scheduler/`: `NiTFlowMatchScheduler` config
- `vae/`: `AutoencoderDC` weights + config
- `custom_pipeline/`: local, self-contained implementation for:
- `NiTPipeline`
- `NiTTransformer2DModel`
- `NiTFlowMatchScheduler`
- `test_inference.py`: standalone sampling script
This repository does **not** depend on an external `NiT-diffusers` checkout during inference.
It includes a root `pipeline.py` custom entrypoint for Diffusers dynamic loading.
## Quickstart
### 1) Environment
Install dependencies (example):
```bash
pip install torch diffusers safetensors
```
If using this project environment:
```bash
conda activate rsgen
```
### 2) Generate a demo image
Run from this repository root:
```bash
python test_inference.py \
--class-label 207 \
--height 512 \
--width 512 \
--steps 250 \
--mode sde \
--guidance-scale 2.05 \
--guidance-low 0.0 \
--guidance-high 0.7 \
--output demo_images/demo_sde250_class207_seed42.png
```
## Python usage
```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path(".").resolve()
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" and torch.cuda.is_bf16_supported() else torch.float32
pipe = DiffusionPipeline.from_pretrained(
model_dir,
custom_pipeline=str(model_dir / "pipeline.py"),
local_files_only=True,
).to(device)
if device == "cuda":
pipe.transformer.to(dtype=dtype)
pipe.vae.to(dtype=dtype)
gen = torch.Generator(device=device).manual_seed(42)
result = pipe(
class_labels=[207],
height=512,
width=512,
num_inference_steps=250,
mode="sde",
guidance_scale=2.05,
guidance_interval=(0.0, 0.7),
generator=gen,
)
result.images[0].save("demo_images/sample.png")
```
For remote Hub loading:
```python
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/NiT-XL-diffusers",
custom_pipeline="pipeline",
)
```
## Recommended inference settings
- Resolution: `512x512`
- Mode: `sde`
- Steps: `250`
- Guidance scale: `2.05`
- Guidance interval: `(0.0, 0.7)`
Using very low steps (for example `2`) is only a smoke test and will produce low-quality images.
## Demo
![NiT-XL demo image](demo_images/demo_sde250_class207_seed42.png)
## Citation
If you use this model or the NiT method in your work, please cite:
```bibtex
@article{wang2025native,
title={Native-Resolution Image Synthesis},
author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
year={2025},
eprint={2506.03131},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Notes
- This is a class-conditional generator (ImageNet label ids), not a text-to-image model.
- For reproducibility, set `--seed`.
- The vendored custom pipeline keeps inference behavior consistent without external code dependencies.