BiliSakura
/

NiT-XL-diffusers

Unconditional Image Generation

image-generation

class-conditional

Model card Files Files and versions

NiT-XL-diffusers / README.md

BiliSakura's picture

Update README.md

4510c0f verified 10 days ago

|

history blame contribute delete

3.4 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: diffusers
	tags:
	- diffusers
	- image-generation
	- class-conditional
	- nit
	pipeline_tag: unconditional-image-generation
	widget:
	- output:
	url: demo_images/demo_sde250_class207_seed42.png
	---

	# NiT-XL Diffusers (Class-Conditional)

	Native-resolution Image Transformer (NiT-XL) checkpoint packaged as a Diffusers-style repository with vendored custom code.

	## What is included

	- `transformer/`: `NiTTransformer2DModel` weights + config
	- `scheduler/`: `NiTFlowMatchScheduler` config
	- `vae/`: `AutoencoderDC` weights + config
	- `custom_pipeline/`: local, self-contained implementation for:
	- `NiTPipeline`
	- `NiTTransformer2DModel`
	- `NiTFlowMatchScheduler`
	- `test_inference.py`: standalone sampling script

	This repository does not depend on an external `NiT-diffusers` checkout during inference.
	It includes a root `pipeline.py` custom entrypoint for Diffusers dynamic loading.

	## Quickstart

	### 1) Environment

	Install dependencies (example):

	```bash
	pip install torch diffusers safetensors
	```

	If using this project environment:

	```bash
	conda activate rsgen
	```

	### 2) Generate a demo image

	Run from this repository root:

	```bash
	python test_inference.py \
	--class-label 207 \
	--height 512 \
	--width 512 \
	--steps 250 \
	--mode sde \
	--guidance-scale 2.05 \
	--guidance-low 0.0 \
	--guidance-high 0.7 \
	--output demo_images/demo_sde250_class207_seed42.png
	```

	## Python usage

	```python
	from pathlib import Path
	import torch
	from diffusers import DiffusionPipeline

	model_dir = Path(".").resolve()
	device = "cuda" if torch.cuda.is_available() else "cpu"
	dtype = torch.bfloat16 if device == "cuda" and torch.cuda.is_bf16_supported() else torch.float32

	pipe = DiffusionPipeline.from_pretrained(
	model_dir,
	custom_pipeline=str(model_dir / "pipeline.py"),
	local_files_only=True,
	).to(device)
	if device == "cuda":
	pipe.transformer.to(dtype=dtype)
	pipe.vae.to(dtype=dtype)

	gen = torch.Generator(device=device).manual_seed(42)
	result = pipe(
	class_labels=[207],
	height=512,
	width=512,
	num_inference_steps=250,
	mode="sde",
	guidance_scale=2.05,
	guidance_interval=(0.0, 0.7),
	generator=gen,
	)
	result.images[0].save("demo_images/sample.png")
	```

	For remote Hub loading:

	```python
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"BiliSakura/NiT-XL-diffusers",
	custom_pipeline="pipeline",
	)
	```

	## Recommended inference settings

	- Resolution: `512x512`
	- Mode: `sde`
	- Steps: `250`
	- Guidance scale: `2.05`
	- Guidance interval: `(0.0, 0.7)`

	Using very low steps (for example `2`) is only a smoke test and will produce low-quality images.

	## Demo

	![NiT-XL demo image](demo_images/demo_sde250_class207_seed42.png)

	## Citation

	If you use this model or the NiT method in your work, please cite:

	```bibtex
	@article{wang2025native,
	title={Native-Resolution Image Synthesis},
	author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
	year={2025},
	eprint={2506.03131},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```

	## Notes

	- This is a class-conditional generator (ImageNet label ids), not a text-to-image model.
	- For reproducibility, set `--seed`.
	- The vendored custom pipeline keeps inference behavior consistent without external code dependencies.